Pendo Data Sync allows you to export avro files to Google Cloud Storage (GCS) bucket, Amazon S3, or Microsoft Azure Storage. We use the avro file type because it's well-suited for use in "extract, transform, load" (ETL) pipelines. This article describes the avro files that are exported as part of the Data Sync process and the values contained in each type of avro file.
Data sync exports
Pendo exports three types of avro files as part of the Data Sync process: Event exports, Visitor exports, and Account exports.
Dates and timestamps are in Coordinated Universal Time (UTC). For more information, see Timestamps and time zones in Data Sync export handling.
Event exports
Exports include event files and event type definitions, as well as contextual information so that you can get full value out of the events without having to do additional lookups or API calls. To this end, exports contain:
- An event file for all events, even those your team has yet to define.
- An event file for each defined event type ("matchable"): Page, Feature, and Track Event.
- A definition file for each event type, plus Guide Event, including metadata and details about each defined event.
- An export manifest that references all the files outlined above. This is to help you load data from your cloud storage into your data warehouse.
Visitor exports
Visitor exports include visitor files and the visitor metadata schema definition. Exports contain:
- One or more visitor files for all visitors exported. If you have a large number of visitors, we might send them in batched files named in the following format:
visitors-000.avro
,visitors-001.avro
, and so on. - A metadata schema definition file named
metadataschema.avro
. - An export manifest that references all the files outlined above. This is to help you load data from your cloud storage into your data warehouse.
Account exports
Account exports include account files and the account metadata schema definition. Exports contain:
- One or more account files for all accounts exported. If you have a large number of accounts, we might send them in batched files named in the following format:
accounts-000.avro
,accounts-001.avro
, and so on. - A metadata schema definition file named
metadataschema.avro
. - An export manifest that references all the files outlined above. This is to help you load data from your cloud storage into your data warehouse.
Events file schema
Our events file schema defines the structure and format of the events data we export in avro files for Data Sync. The schema applies to all types of event file:
- All Events
- Page
- Feature
- Track Event
Guide events are contained within the above event-type files, and so don't have their own event file, but do have their own definition file.
The schema definition that follows includes information about the different types of events that can be generated, the fields that are associated with each event type, and the data types that are used to define each field.
Metric | Type | Description |
---|---|---|
matchableId |
STRING | Reference to the matchable (Page, Feature, or Track Event) that has a rule matching this event. Not present in the All Events file. |
periodId |
DATE | Convenience field to assist in data warehouse loading, equal to the date portion of the browserTimestamp for a given event. All dates are in UTC and in the following format: YYYY-MM-DD . |
visitorId |
STRING | The Visitor ID for the event. |
accountId |
STRING | The Account ID for the event. An empty string is used when no account information is available. |
browserTimestamp |
LONG | Timestamp of the event. These can be loaded into a data warehouse as dates with the use_avro_logical_types flag. |
country |
STRING | Country associated with the remoteIp . This field can be blank. |
destinationStepId |
STRING | Relates to guideAdvanced events that specify the ID of the destination step in the guide showing flow. This can be the previous or next step ID. This shows up in the singleEvents and guideEvents sources. |
elementPath |
STRING | For web events. This is either empty or a CSS-style string specifying the DOM element related to the event. For mobile, this is a JSON-formatted description of the widget related to this event. |
eventClass |
STRING |
ui or track . |
eventId |
STRING | Unique event identifier. |
eventSource |
STRING | Event sources include: email , events where an actual email was how the event originated from; mobile , events originating in a mobile app; web , events originating in a web page. |
eventType |
STRING | Type of event. Examples include: change , indicating that a visitor has changed an element in the app; click , indicating that a visitor has clicked on an element in the app; focus , indicating that a visitor has focused on an element in the app; group , an event in response to the "group" call with a Twilio Segment integration; guideActivity , indicating that a visitor has interacted with a Pendo guide; guideDismissed , indicating that a visitor has dismissed a Pendo guide; guideSeen , indicating that a visitor has seen a Pendo guide; identify , indicating that the app identified the visitor or account identity; load , indicating that a page was loaded; meta , indicating that the app sent visitor and account metadata; and pollResponse , indicating that a visitor submitted a Pendo poll. |
guideId |
STRING | Unique identifier of the guide generating guide- and poll-related events. This field can be blank. |
guideSeenReason |
STRING | For the guideSeen event type, why the guide was displayed to the user. |
guideSeenTimeoutMS |
LONG |
For guideTimeout events, the amount of time that the agent waited to show a specific guide step before sending a guideTimeout event. |
guideSessionId |
STRING | Identifiers of the list of guides and other deliverables that were loaded together. This ID changes every time guides are requested by the client and events that happen between each load carry the same ID. |
guideSnoozeDurationMS |
LONG | For the guideSnoozed event type, the amount of time the guide was snoozed for in milliseconds. |
guideStepId |
STRING | Unique identifier of the guide step generating the guide- or poll-related event. This field can be blank. |
language |
STRING | For the guideSeen event type, the language of the guide being displayed. This field can be blank. |
latitude |
FLOAT | The latitude of the remoteIp for the event. This field can be blank. |
longitude |
FLOAT | The longitude of the remoteIp for the event. This field can be blank. |
oldVisitorId |
STRING | The anonymous Visitor ID that was previously assigned to the visitor before they were identified by Pendo through authentication. |
loadDurationMS |
LONG | For load events, the amount of time it took for the webpage to render in milliseconds. This doesn't include the time it takes for dynamic parts of the page to load. |
pollId |
STRING | The identifier of the poll that generated any pollResponse events. |
pollResponse |
STRING | The JSON-formatted response to the poll that generated a pollResponse event. This can be an index into data that's only available in the poll itself. If the eventType isn't pollResponse , there is no pollResponse property. |
pollType |
STRING | The type of poll that generated a pollResponse event. This could be NumberScale , PositiveNegative , FreeForm , or PickList . |
propertiesJson |
STRING | A JSON-formatted map of all the user-defined event properties for this event, including page parameters. If historical metadata has been promoted to the visitor or the account, this JSON includes a visitormetadata and visitormetadata string, respectively. |
region |
STRING | The United States region, presented as a two-letter code for the state, of the remoteIp for the event. This field can be blank. |
remoteIp |
STRING | The remoteIp that generated the event. If not collecting this data, 0.0.0.0 is stored instead. This can also be an ipv6 address. Some proxies and mobile networks prevent useful IP addresses being collected. |
server |
STRING | The server name portion of the URL for the event. This field can be blank. |
uiElementActions |
STRING | The element actions, such as openLink or guideSnoozed , associated with a guide interaction where a guideActivity event is sent by the agent. This appears in the following sources: singleEvents and guideEvents . |
uiElementId |
STRING | The guide element's unique identifier when a UI element inside a guide is clicked on, which sends a guideActivity event to the agent. This appears in the following sources: singleEvents , guideEvents , guideElementClick , and guideElementClickEver . |
uiElementText |
STRING | The guide element's text for when the agent sends the field as part of a guideActivity event. When a guideActivity event is sent with the guide element's text, ui_element_text should appear in the following sources: singleEvents and guideEvents . If a guideActivity event is sent without the ui_element_text field, we still process it. If a subscription has opted to exclude all text, ui_element_text isn't stored, even if the event is sent with it. |
uiElementType |
STRING | The type of element that was clicked when a guideActivity event is being sent. This appears in the following sources: singleEvents and guideEvents . |
url |
STRING | The normalized URL for the page that generated a web event. For mobile events, the URL is a JSON representation of the screen structure. |
userAgent |
STRING | The user agent from the HTTPS request when a web event is received. For mobile events, the userAgent is the textual representation of the device type that generated the event. Both types of value are properly parsed by the user agent parsing functions with aggregations. |
Event type definitions
The definitions for Pages, Features, and Track Events are included in their own avro files, which can be added to the event data contained in the relevant event file.
Pages
Metric | Type | Description |
---|---|---|
pageId |
STRING | Page identifier. |
kind |
STRING | Description of the type of object. This will always be Page . |
lastUpdatedAt |
LONG | Epoch timestamp for when the Page was last updated in milliseconds. |
createdAt |
LONG | Epoch timestamp for when the Page was created in milliseconds. |
rulesJson |
STRING | The regex rules that define the Page if created by Pendo's classic (legacy) Designer. This field is empty for Pages that were created with the Visual Design Studio instead of the classic Designer. |
name |
STRING | The name given to the Page. |
isCoreEvent |
BOOLEAN | Whether the event is a Pendo Core Event. |
Features
Metric | Type | Description |
---|---|---|
featureId |
STRING | Feature identifier (unique for each subscription). |
kind |
STRING | Description of the type of object. This will always be Feature . |
lastUpdatedAt |
LONG | Epoch timestamp for when the Feature was last updated in milliseconds. |
createdAt |
LONG | Epoch timestamp for when the Feature was created in milliseconds. |
pageId |
STRING | The Page identifier containing the Feature. |
name |
STRING | The name given to the Feature. |
isCoreEvent |
BOOLEAN | Whether the event is a Pendo Core Event. |
Track Events
Metric | Type | Description |
---|---|---|
trackTypeId |
STRING | Track Event identifier (unique for each subscription). |
kind |
STRING | Description of the type of object. This will always be TrackType . |
lastUpdatedAt |
LONG | Epoch timestamp for when the Track Event was last updated in milliseconds. |
createdAt |
LONG | Epoch timestamp for when the Track Event was created in milliseconds. |
eventPropertyNames |
STRING | The names of the Track Event properties included. |
name |
STRING | The name given to the Track Event. |
isCoreEvent |
BOOLEAN | Whether the event is a Pendo Core Event. |
Guides
Metric | Type | Description |
---|---|---|
guideId |
STRING | Guide identifier (unique for each subscription). |
kind |
STRING | Description of the type of object. This will always be Guide . |
lastUpdatedAt |
LONG | Epoch timestamp for when the guide was last updated in milliseconds. |
createdAt |
LONG | Epoch timestamp for when the guide was created in milliseconds. |
state |
STRING | The visibility state of the guide: draft , staged , public , or disabled . |
name |
STRING | The name given to the Guide. |
emailState |
STRING | The state of email backup for NPS: draft when disabled, and public when enabled. |
launchMethod |
STRING | The set of launch methods a guide might use, delineated by a hyphen. |
isMultiStep |
BOOLEAN | Whether a guide has more than one step. |
isTraining |
BOOLEAN | Whether the guide belongs to an "Adopt for Partners" end-user application. |
recurrence |
LONG | The recurrence period for an NPS guide in milliseconds. |
recurrenceEligibilityWindow |
LONG | The length of time in milliseconds for which an individual visitor is eligible for an NPS guide when even distribution is enabled. |
attributeJson |
STRING | JSON representation of guide attributes, including the type of guide, the badge description, the types of devices the guide is enabled for, and the last version of the Visual Design Studio that the guide was edited on. |
audience |
STRING | The logic defining the visitors targeted by the guide. |
audienceUiHint |
STRING | A more human-readable representation of the segment that was applied to the guide. |
resetAt |
LONG | The timestamp for when the guide was last reset. |
publishedAt |
LONG | The timestamp for when the guide was most recently published. |
steps |
RECORD | Guide steps containing STRING values for guideStepId , name , pageId , appRelayUrl , and elements . |
At this time, we don't send poll questions in the guide object schema.
Visitors file schema
Our visitors file schema defines the structure and format of the visitor data that we export in avro files for Data Sync visitor exports.
Metric | Type | Description |
---|---|---|
id |
STRING | Unique identifier for the visitor. |
accountids |
ARRAY | List of STRING values that are unique Account IDs to which the visitor belongs. |
accountid |
STRING | The Account ID las associated with the visitor. |
lastservername |
STRING | The most recent server name. |
firstvistMS |
LONG | The timestamp in milliseconds when an event was first captured for the visitor. |
idhash |
INT | The hash of the Visitor ID. |
lastbrowsername |
STRING | The most recent browser name. |
lastbrowserversion |
STRING | The most recent browser version. |
lastoperatingsystem |
STRING | The most recent operating system. |
lastvisitMS |
LONG | The timestamp in milliseconds when an event was last recorded for the visitor. |
lastupdatedMS |
LONG | The timestamp in milliseconds when the visitor was last updated. |
lastuseragent |
STRING | The most recent user agent (unparsed). |
identifiedvisitoratMS |
LONG | If an anonymous Visitor ID is merged with an identified Visitor ID through Pendo identity management, this field shows the timestamp in milliseconds for when the visitor was identified. |
identifiedvisitorid |
STRING | If an anonymous Visitor ID is merged with an identified Visitor ID through Pendo identity management, this field shows the identified Visitor ID. This allows for a unified view of visitor journeys. |
<fieldname>_<APP>_<ID> |
Varies depending on the field: STRING, LONG, or INT | If in a multi-app subscription, Pendo sends <fieldname>_<APP>_<ID> values for any of the above metadata fields, which might vary across applications. Metadata fields for which this is possible have isPerAPP set to true in the metadata schema file. |
agent_<fieldname> |
Varies depending on the field. |
A field is created in the visitor avro file for all agent metadata fields. Field types depends on how you set up your data in the Data Mappings page in Pendo. For a mapping of Pendo metadata field types to avro, types, see Pendo to avro data mappings. |
<metadata_group> |
STRING | For any metadata group other than agent , we send a JSON representation of the metadata group and all its fields. The metadata schema file contains the name, type, and other information for each field in the metadata group. |
incompleteFields |
List of STRINGs | List of metadata fields with values that were incompatible with the type specified in your metadata schema. |
Account file schema
Our accounts file schema defines the structure and format of the account data that we export in avro files for Data Sync account exports.
Metric | Type | Description |
---|---|---|
id |
STRING | Unique identifier for the account. |
auto.id |
STRING | The same unique identifier as the id . |
firstvistMS |
LONG | The timestamp in milliseconds when an event was first captured for the account. |
idhash |
INT | The hash of the Account ID. |
lastvisitMS |
LONG | The timestamp in milliseconds when an event was last recorded for the account. |
lastupdatedMS |
LONG | The timestamp in milliseconds when the account was last updated. |
<fieldname>_<APP>_<ID> |
Varies depending on the field: STRING, LONG, or INT | If in a multi-app subscription, Pendo sends <fieldname>_<APP>_<ID> values for any of the above metadata fields, which might vary across applications. Metadata fields for which this is possible have isPerAPP set to true in the metadata schema file. |
agent_<fieldname> |
Varies depending on the field. | A field is created in the visitor avro file for all agent metadata fields. Field types depends on how you set up your data in the Data Mappings page in Pendo. For a mapping of Pendo metadata field types to avro, types, see Pendo to avro data mappings. |
<metadata_group> |
STRING | For any metadata group other than agent , we send a JSON representation of the metadata group and all its fields. The metadata schema file contains the name, type, and other information for each field in the metadata group. |
incompleteFields |
List of STRINGs | List of metadata fields with values that were incompatible with the type specified in your metadata schema. |
Metadata schema file
Metric | Type | Description |
---|---|---|
avroFieldName |
STRING | The name of the field in the visitor avro file. |
name |
STRING | The name of the metadata field in Pendo. |
group |
STRING | The name of the group in the metadata field in Pendo. |
displayName |
STRING | The display name of the metadata field in Pendo. |
type |
STRING | The data type of the field. Options include string , int , float , boolean , time , or list . |
elementType |
STRING | Of type is a list , the data type of each element. Otherwise, this is empty. |
elementformat |
STRING | The data format of the metadata field. For example, if type is time , then elementformat might be milliseconds . |
isDeleted |
BOOLEAN | True if the field has been deleted. |
isPerApp |
BOOLEAN | True if the field can exist for each application. This is only possible for multi-app subscriptions. |
Pendo to avro data mappings
Field type in Pendo Data Mappings | Avro type |
---|---|
Text (string) | STRING |
Number (int) | INT |
Number (float) | FLOAT |
Boolean (boolean) | BOOLEAN |
Date (time) | LONG (with logical type of milliseconds) |
List (list) | ARRAY |
Example
The events file schema applies to all files for each of the event types: All Events, Pages, Features, and Track Events. There's no Guide-specific file. Instead, guide events are included in the All Events file, as well as the relevant Pages, Features, and Track Events files.
For example, if "Guide Y" is launched on "Page X", a guideActivity
event would be present in the All Events file and in the event stream for Page X with guideId
set to Y. If an event isn't a guideActivity
event, the fields associated with guide events are blank.
Additionally, there can be more than one event file for each type of event. For example, if your application has three tagged Pages, two tagged Features, and two defined Track Events, you would receive 12 avro files for each export:
- Guide definitions (allguides.avro)
- Page definitions (allpages.avro)
- Feature definitions (allfeatures.avro)
- Track Event definitions (alltracktypes.avro)
- All Events file (allevents.avro)
- Three Page event files (page1.avro, page2.avro, page3.avro)
- Two Feature event files (feature1.avro, feature2.avro)
- Two Track Event files (tracktype1.avro, tracktype2.avro)
The file names in this list are for illustrative purposes. The Page, Feature, and Track Event file names would reflect the appropriate ID that can be found in the billofmaterials.json for each export.