Data Sync allows you to send account and visitor metadata directly from Pendo to your preferred cloud storage destination. From there, it can be automatically pulled into your centralized data lake or warehouse on a regular cadence.
Account and visitor exports are independent data types that are different from events and are applicable at the subscription level. You can export accounts, visitors, or both. Account and visitor exports are delivered to their own subfolders within the storage destination you configure for Pendo Data Sync. You can choose to create one-time exports or ongoing metadata updates, which send an initial export of all metadata followed by a daily export for any accounts or visitors whose metadata was updated within the previous 24 hours.
This article is written to help you understand the flow of account and visitor exports in Data Sync and to prepare you for implementation. Specifically, this article:
- Outlines prerequisite steps.
- Explains the flow and frequency of exports.
- Clarifies the hierarchy and purposes of files contained within an export.
- Details how to load an export into your data lake or warehouse.
Prerequisites
- Data Sync configured with a destination. For more information, see:
- Ability to use data warehouse destination commands or SDK to create and load tables from an avro file source.
Create account and visitor exports
After setting up your destination, Data Sync customers can create one of two types of account and visitor exports.
- A historical One-time export, which exports all accounts or visitors in your subscription at the time of the export.
- An Ongoing metadata updates export, which exports all accounts or visitors in your subscription that have been updated in the last 24 hours. This export runs every day to export updated metadata for the previous 24 hours.
Note: If Data Sync isn't yet included in your Pendo subscription, you can create a one-day Test export instead, which is available as a trial feature. For more information, see Overview of Pendo Data Sync. You can only export event data as part of the trial.
To create an export:
- Go to Settings > Data Sync and select + Create export in the top-right corner of the page.
- In the screen that opens:
- Enter a meaningful name.
- Choose Accounts and visitors as the data type.
- Choose a data level: Accounts or Visitors.
- Choose an export type: One-time or Ongoing metadata updates.
- Select Next: Export Summary.
- Check the summary and then select Create export.
Each export is delivered to the bucket URL and path provided in the destination configuration. For example, if the following path is used, gs://pendo-data/abcde
, Pendo creates a datasync
folder at gs://pendo-data/abcde/datasync
. Exports are delivered to a subfolder named after the exported entity's data level. In this example, each account export would be added to a subfolder under gs://pendo-data/abcde/datasync/<subscription-id>/accounts
and each visitor export would be added to a subfolder under s://pendo-data/abcde/datasync/<subscription-id>/visitors
.
View account and visitor exports
To see visitor and account exports that you've already created, go to Settings > Data Sync and open the Accounts and visitors tab. The Exports table provides a list of exports from within the date range specified by filters at the top of the page. This table includes the following columns:
- Name. The title that you gave to the export.
- Data type. Whether the export includes metadata at the account-level, visitor-level, or both.
- Status. Whether the export is still in progress, complete, or failed.
- Export type. Whether the export is a one-time export or ongoing.
- Create time. When the export was created.
- Created by. Who the export was created by.
- Last successful sync. When the export was last successfully updated.
Use the dropdown menus at the top of the page to filter the items in the Exports table by the following:
- Date range
- Data levels
- Created by
- Statuses
- Export type
To view a summary of a particular export, select the export name from the Exports table. This opens a panel on the right side of the screen. This shows you the same information as in the table, with additional details like the size of the export and any messages that might be useful, for example, in the case of an export failure.
Load account and visitor exports
The following information is needed to successfully load Pendo event data from your cloud storage destination into your data lake or warehouse.
Because account and visitor metadata are subscription-level data, Pendo creates separate folders for accounts and visitors inside the subscription folder. Inside these subscription-level subfolders (one for accounts and one for visitors), there is:
- An export manifest (
exportmanifest.json
), which is updated after an export completes - A unique hashed folder (
export-uuid
) consisting of a bill of materials file for that export, and avro files containing the exported data.
File names alone don't fully explain their contents, and so it's also important to understand:
- The structure of exports. See Account export file hierarchy and Visitor export file hierarchy.
- The purpose of the files in an export. For explanations of each exported file, see File descriptions. For more information about the data included in exported files, see Data Sync schema definitions.
File hierarchies
The following sections provide an overview of the structure of files included in an account export and a visitor export. For more details about the information contained in these files, see File descriptions in this article, and the Data Sync schema definitions article.
Account export file hierarchy
gs://pendo-data/datasync/<subscription-id>/<accounts>/
├── exportmanifest.json
└── <export-uuid>/
├── billofmaterials.json
├── accounts.avro
├── metadata schema.avro
└── <export-uuid>/
├── ...
If you have a large volume of accounts in your subscription, you might receive multiple account files in the following format: accounts-000.avro
, accounts-001.avro
, and so on.
Visitor export file hierarchy
gs://pendo-data/datasync/<subscription-id>/<accounts>/
├── exportmanifest.json
└── <export-uuid>/
├── billofmaterials.json
├── visitors.avro
├── metadata schema.avro
└── <export-uuid>/
├── ...
If you have a large volume of visitots in your subscription, you might receive multiple account files in the following format: visitors-000.avro
, visitors-001.avro
, and so on.
Unique identifiers
You can extract more information about the subscription from Pendo’s Aggregations API using the unique identifiers in the path at the top of the hierarchy: subscription-id
.
A folder of account or visitor data, which is named according to the unique export identifier, exists at the same level in the hierarchy as the export manifest (exportmanifest.json
).
Export management files
There are two lists of export contents:
- The export manifest (
exportmanifest.json
), which is a concatenated list of daily bills of materials. For more information, see Export manifest in this article. - The daily bill of materials (
billofmaterials.json
) within the individual export, which is the JSON representation of the export's contents used by ETL automation to load exported avro event files into a data warehouse or data lake. For more information, see Bill of materials in this article.
Account files
The contents of the account export exist inside an accounts file. For smaller volumes of accounts, this information is stored in accounts.avro
. Subscriptions with a large volume of account exports might be split into multiple account files, named in the following format: accounts-000.avro
, accounts-001.avro
, and so on.
The following code example shows what an entry in an account file might look like:
{
"agent_email": "testUser@pendo.io",
"agent_firstname": "test",
"agent_full_name": "test user",
"agent_lastname": "user",
"agent_user_job_role": "developer",
"lastbrowsername": "Chrome",
"lastbrowserversion": "89.0.4328",
"lastservername": "local.pendo.io:3000",
"accountid": "account1",
"firstvisitMS": 1659627208875,
"idhash": 5419074444
"id": "accountId123",
"accountids": [
"account1"
],
"lastupdatedMS": 1659627209099,
"lastuseragent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 12_4_0) AppleWebKit/537.36 (KHTML, like Gecko) Cypress/7.7.0 Chrome/89.0.4328.0 Electron/12.0.0-beta.14 Safari/537.36",
"lastoperatingsystem": "Mac OS X",
"lastvisitMS": 1659627208875,
"custom": "{
"myCustomField": "test"
}",
"pendo_hubspot": "{
"record_id": 12345
}",
"salesforce": "{}"
}
Fields from the agent metadata group are flattened into agent_field-name
. Other metadata groups, like custom metadata or groups populated by integrations, are condensed into a single string object. The metadata file schema includes the names and types of each field included. For more information about the account file's schema definition, see Data Sync schema definitions.
Visitor files
The contents of the visitor export exist inside a visitor file. For smaller volumes of visitors, this information is stored in visitors.avro
. Subscriptions with a large volume of visitor exports might be split into multiple visitor files, named in the following format: visitors-000.avro
, visitors-001.avro
, and so on.
The following code example shows what an entry in a visitor file might look like:
{
"agent_email": "testUser@pendo.io",
"agent_firstname": "test",
"agent_full_name": "test user",
"agent_lastname": "user",
"agent_user_job_role": "developer",
"lastbrowsername": "Chrome",
"lastbrowserversion": "89.0.4328",
"lastservername": "local.pendo.io:3000",
"accountid": "account1",
"firstvisitMS": 1659627208875,
"idhash": 5419074444
"id": "visitorId123",
"accountids": [
"account1"
],
"lastupdatedMS": 1659627209099,
"lastuseragent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 12_4_0) AppleWebKit/537.36 (KHTML, like Gecko) Cypress/7.7.0 Chrome/89.0.4328.0 Electron/12.0.0-beta.14 Safari/537.36",
"lastoperatingsystem": "Mac OS X",
"lastvisitMS": 1659627208875,
"custom": "{
"myCustomField": "test"
}",
"pendo_hubspot": "{
"record_id": 12345
}",
"salesforce": "{}"
}
Fields from the agent metadata group are flattened into agent_field-name
. Other metadata groups, like custom metadata or groups populated by integrations, are condensed into a single string object. The metadata file schema includes the names and types of each field included. For more information about the visitor file's schema definition, see Data Sync schema definitions.
Metadata schema file
Each export includes a metadataschema.avro
file, which contains the export file's schema. This can be used to ensure that the metadata can be properly imported into your ETL tool. The following code example shows what an entry in the metadataschema.avro
file might look like.
File descriptions
The following table provides a description of the files contained in an export. For more information about the data included in exported files, see Data Sync schema definitions.
The file names are relative. An absolute file name can be obtained by prepending the rootUrl
field from the export manifest to the relative file name. The rootUrl
also corresponds to the path of the billofmaterials.json
file (excluding its filename).
Content type | File name | Description |
Definition management file: Record of files over several exports, plus some metadata. | exportmanifest.json |
Contains a concatenated list of files for each part of the daily export, covering a rolling window of exports generated within the last 30 days. The counter iterates with each new export received; zero (0) means there's nothing to do and the content can be deleted. |
Export management file: Record of export contents. | billofmaterials.json |
A JSON representation of the export contents. This is used by ETL automation to load exported avro event files into a data warehouse. |
Account file. | accounts.avro |
Contains the account metadata export. |
Visitor file. | visitors.avro |
Contains the visitor metadata export. |
Metadata schema file. | metadataschema.avro |
Contains the export's schema. |
Bills of materials
The bill of materials documents the contents of the export.
Account export's bill of materials
The following code snippet is an example of what the billofmaterials.json
looks like for an account Data Sync export.
{
"timestamp": "2023-02-16T20:21:11Z",
"numberOfFiles":65,
"subscription": 65,
"displayName": "ACME CRM",
"id": "6591622502678528"
},
"accounts": {
"count": 243,
"files": [
"accounts.avro"
]
},
"metadataSchemaFile": [
"metadataschema.avro"
],
"exportType": [
"One-time"
],
"dataLevel": "account"
}
exportType
can be one of the following:
["One-time"]
if the export is a one-time export.["Ongoing metadata updates"]
if the export is a ongoing.
In the accounts
section:
count
is the number of accounts exported.files
is the list of files that contain the exported accounts. If multiple files are sent, they are namedaccounts-000.avro
,accounts-001.avro
, and so on.
If no accounts are exported, the accounts
section isn't present.
Visitor export's bill of materials
The following code snippet is an example of what the billofmaterials.json
looks like for an visitor Data Sync export.
{
"timestamp": "2023-02-16T20:21:11Z",
"numberOfFiles":65,
"subscription": 65,
"displayName": "ACME CRM",
"id": "6591622502678528"
},
"visitors": {
"count": 243,
"files": [
"visitors.avro"
]
},
"metadataSchemaFile": [
"metadataschema.avro"
],
"exportType": [
"One-time"
],
"dataLevel": "visitor"
}
exportType
can be one of the following:
["One-time"]
if the export is a one-time export.["Ongoing metadata updates"]
if the export is a ongoing.
In the visitors
section:
count
is the number of accounts exported.files
is the list of files that contain the exported accounts. If multiple files are sent, they are namedvisitors-000.avro
,visitors-001.avro
, and so on.
If no visitors are exported, the visitors
section isn't present.
Export manifest
Visitor and account exports are managed by separate export manifests.
The export manifest is a key file for reading and ingesting exports. The export manifest is a concatenation of multiple bills of materials, with some additional metadata. It consists of a rolling record of the past 30-day period of Data Sync activity, regardless of which dates of data are exported.
While the bill of materials provides details of everything in a single export, the export manifest operates at a higher level, allowing you to keep track of what’s happening with all of your exports over time. This allows you to iterate through the counters.
The following code snippet is an example of what the exportmanifest.json
looks like for a Data Sync export, excluding parts that overlap with the billofmaterials.json
.
{
"exports": [
{
// complete billofmaterials object present but omitted for brevity
"exportType": [...],
"counter": 1,
"finishTime": "2023-03-03T14:10:15.311651Z",
"storageSize": 12130815,
"rootUrl": "gs://pendo-data/datasync/6591622502678528/-323232/0f39bdf6-09c2-4e4d-6d4f-b02c961d8aaf"
},
{
// complete billofmaterials object present but omitted for brevity
"exportType": [...],
"counter": 2,
"finishTime": "2023-03-03T14:20:12.9489274",
"storageSize": 23462682,
"rootUrl": "gs://pendo-data/datasync/6591622502678528/-323232/b979502c-1a01-4569-74cf-e4a7f5049d8f"
}
],
"generatedTime": "2023-03-05T04:17:59.853205005Z"
}
The export manifest only reflects exports and the subsequent files that have been completely loaded into your cloud storage. There's never a partial export in the export manifest. An export listed in the export manifest is always a complete export.
Updates to exported data
Ongoing metadata updates include updates to metadata that have been previously exported. Your ETL process should include the appropriate "drop and replace" logic to ensure that a duplicate account (Account ID) or visitor (Visitor ID) isn't accidentally ingested.
Example load flow
This example creates a separate table in the data warehouse for each event type file. You can load data to suit your needs as long as the data is replaced correctly.
Step 1. Read the most recent export manifest
Read the most recent exportmanifest.json
file to find all unprocessed exports since the last time data was loaded. You can use the counter field as a marker for load progress.
Step 2. Iterate over each entry in the exports list
Cycle through the entries in the list to process the entries and load them into a table.
The latest version of the Pendo metadata schema file is sent in each export. When loading these files into your data warehouse, you must ensure that you replace the previous file with each export and use logical avro type mapping.
Load metadataschema
. If the metadataschema
table doesn't exist, create it. If the metadataschema
table exists, drop all data.
Step 3. Iterate over accounts or visitors
Load each account or visitor file from the list in the .... with logical type mapping. If the account or visitor table doesn't exist, create it. To avoid duplication, if the account or visitor table does exist, drop any account or visitor metadata for the given Account ID or Visitor ID from the table and append data from the account or visitor file to the account or visitor table.