Data Sync schema definitions

Last updated:

Pendo Data Sync allows you to export avro files to Google Cloud Storage (GCS) bucket, Amazon S3, or Microsoft Azure Storage. We use the avro file type because it's well-suited for use in "extract, transform, load" (ETL) pipelines. This article describes the avro files that are exported as part of the Data Sync process and the values contained in each type of avro file.

Data sync exports

Pendo exports three types of avro files as part of the Data Sync process: Event exportsVisitor exports, and Account exports.

Dates and timestamps are in Coordinated Universal Time (UTC). For more information, see Timestamps and time zones in Data Sync export handling.

Event exports

Exports include event files and event type definitions, as well as contextual information so that you can get full value out of the events without having to do additional lookups or API calls. To this end, exports contain:

  • An event file for all events, even those your team has yet to define.
  • An event file for each defined event type ("matchable"): Page, Feature, and Track Event. 
  • A definition file for each event type, plus Guide Event, including metadata and details about each defined event.
  • An export manifest that references all the files outlined above. This is to help you load data from your cloud storage into your data warehouse.

Visitor exports

Visitor exports include visitor files and the visitor metadata schema definition. Exports contain:

  • One or more visitor files for all visitors exported. If you have a large number of visitors, we might send them in batched files named in the following format: visitors-000.avro, visitors-001.avro, and so on.
  • A metadata schema definition file named metadataschema.avro.
  • An export manifest that references all the files outlined above. This is to help you load data from your cloud storage into your data warehouse.

Account exports

Account exports include account files and the account metadata schema definition. Exports contain:

  • One or more account files for all accounts exported. If you have a large number of accounts, we might send them in batched files named in the following format: accounts-000.avro, accounts-001.avro, and so on.
  • A metadata schema definition file named metadataschema.avro.
  • An export manifest that references all the files outlined above. This is to help you load data from your cloud storage into your data warehouse.

Events file schema

Our events file schema defines the structure and format of the events data we export in avro files for Data Sync. The schema applies to all types of event file:

  • All Events
  • Page
  • Feature
  • Track Event

Guide events are contained within the above event-type files, and so don't have their own event file, but do have their own definition file.

The schema definition that follows includes information about the different types of events that can be generated, the fields that are associated with each event type, and the data types that are used to define each field.

Metric Type Description
matchableId STRING Reference to the matchable (Page, Feature, or Track Event) that has a rule matching this event. Not present in the All Events file.
periodId DATE Convenience field to assist in data warehouse loading, equal to the date portion of the browserTimestamp for a given event. All dates are in UTC and in the following format: YYYY-MM-DD.
visitorId STRING The Visitor ID for the event.
accountId STRING The Account ID for the event. An empty string is used when no account information is available.
browserTimestamp LONG Timestamp of the event. These can be loaded into a data warehouse as dates with the use_avro_logical_types flag.
country STRING Country associated with the remoteIp. This field can be blank.
destinationStepId STRING Relates to guideAdvanced events that specify the ID of the destination step in the guide showing flow. This can be the previous or next step ID. This shows up in the singleEvents and guideEvents sources.
elementPath STRING For web events. This is either empty or a CSS-style string specifying the DOM element related to the event. For mobile, this is a JSON-formatted description of the widget related to this event.
eventClass STRING ui or track.
eventId STRING Unique event identifier.
eventSource STRING Event sources include: email, events where an actual email was how the event originated from; mobile, events originating in a mobile app; web, events originating in a web page.
eventType STRING Type of event. Examples include: change, indicating that a visitor has changed an element in the app; click, indicating that a visitor has clicked on an element in the app; focus, indicating that a visitor has focused on an element in the app; group, an event in response to the "group" call with a Twilio Segment integration; guideActivity, indicating that a visitor has interacted with a Pendo guide; guideDismissed, indicating that a visitor has dismissed a Pendo guide; guideSeen, indicating that a visitor has seen a Pendo guide; identify, indicating that the app identified the visitor or account identity; load, indicating that a page was loaded; meta, indicating that the app sent visitor and account metadata; and pollResponse, indicating that a visitor submitted a Pendo poll.
guideId STRING Unique identifier of the guide generating guide- and poll-related events. This field can be blank.
guideSeenReason STRING For the guideSeen event type, why the guide was displayed to the user.
guideSeenTimeoutMS

LONG

For guideTimeout events, the amount of time that the agent waited to show a specific guide step before sending a guideTimeout event.
guideSessionId STRING Identifiers of the list of guides and other deliverables that were loaded together. This ID changes every time guides are requested by the client and events that happen between each load carry the same ID.
guideSnoozeDurationMS LONG For the guideSnoozed event type, the amount of time the guide was snoozed for in milliseconds.
guideStepId STRING Unique identifier of the guide step generating the guide- or poll-related event. This field can be blank.
language STRING For the guideSeen event type, the language of the guide being displayed. This field can be blank.
latitude FLOAT The latitude of the remoteIp for the event. This field can be blank.
longitude FLOAT The longitude of the remoteIp for the event. This field can be blank.
oldVisitorId STRING The anonymous Visitor ID that was previously assigned to the visitor before they were identified by Pendo through authentication.
loadDurationMS LONG For load events, the amount of time it took for the webpage to render in milliseconds. This doesn't include the time it takes for dynamic parts of the page to load.
pollId STRING The identifier of the poll that generated any pollResponse events.
pollResponse STRING The JSON-formatted response to the poll that generated a pollResponse event. This can be an index into data that's only available in the poll itself. If the eventType isn't pollResponse, there is no pollResponse property.
pollType STRING The type of poll that generated a pollResponse event. This could be NumberScale, PositiveNegative, FreeForm, or PickList.   
propertiesJson STRING A JSON-formatted map of all the user-defined event properties for this event, including page parameters. If historical metadata has been promoted to the visitor or the account, this JSON includes a visitormetadata and visitormetadata string, respectively.
region STRING The United States region, presented as a two-letter code for the state, of the remoteIp for the event. This field can be blank.
remoteIp STRING The remoteIp that generated the event. If not collecting this data, 0.0.0.0 is stored instead. This can also be an ipv6 address. Some proxies and mobile networks prevent useful IP addresses being collected.
server STRING The server name portion of the URL for the event. This field can be blank.
uiElementActions STRING The element actions, such as openLink or guideSnoozed, associated with a guide interaction where a guideActivity event is sent by the agent. This appears in the following sources: singleEvents and guideEvents.
uiElementId STRING The guide element's unique identifier when a UI element inside a guide is clicked on, which sends a guideActivity event to the agent. This appears in the following sources: singleEvents, guideEvents, guideElementClick, and guideElementClickEver.
uiElementText STRING The guide element's text for when the agent sends the field as part of a guideActivity event. When a guideActivity event is sent with the guide element's text, ui_element_text should appear in the following sources: singleEvents and guideEvents. If a guideActivity event is sent without the ui_element_text field, we still process it. If a subscription has opted to exclude all text, ui_element_text isn't stored, even if the event is sent with it.
uiElementType STRING The type of element that was clicked when a guideActivity event is being sent. This appears in the following sources: singleEvents and guideEvents.
url STRING The normalized URL for the page that generated a web event. For mobile events, the URL is a JSON representation of the screen structure.
userAgent STRING The user agent from the HTTPS request when a web event is received. For mobile events, the userAgent is the textual representation of the device type that generated the event. Both types of value are properly parsed by the user agent parsing functions with aggregations.

Event type definitions

The definitions for Pages, Features, and Track Events are included in their own avro files, which can be added to the event data contained in the relevant event file.

Pages

Metric Type Description
pageId STRING Page identifier.
kind STRING Description of the type of object. This will always be Page.
lastUpdatedAt LONG Epoch timestamp for when the Page was last updated in milliseconds.
createdAt LONG Epoch timestamp for when the Page was created in milliseconds.
rulesJson STRING The regex rules that define the Page if created by Pendo's classic (legacy) Designer. This field is empty for Pages that were created with the Visual Design Studio instead of the classic Designer.
name STRING The name given to the Page.
isCoreEvent BOOLEAN Whether the event is a Pendo Core Event.

Features

Metric Type Description
featureId STRING Feature identifier (unique for each subscription).
kind STRING Description of the type of object. This will always be Feature.
lastUpdatedAt LONG Epoch timestamp for when the Feature was last updated in milliseconds.
createdAt LONG Epoch timestamp for when the Feature was created in milliseconds.
pageId STRING The Page identifier containing the Feature.
name STRING The name given to the Feature.
isCoreEvent BOOLEAN Whether the event is a Pendo Core Event.

Track Events

Metric Type Description
trackTypeId STRING Track Event identifier (unique for each subscription).
kind STRING Description of the type of object. This will always be TrackType.
lastUpdatedAt LONG Epoch timestamp for when the Track Event was last updated in milliseconds.
createdAt LONG Epoch timestamp for when the Track Event was created in milliseconds.
eventPropertyNames STRING The names of the Track Event properties included.
name STRING The name given to the Track Event.
isCoreEvent BOOLEAN Whether the event is a Pendo Core Event.

Guides

Metric Type Description
guideId STRING Guide identifier (unique for each subscription).
kind STRING Description of the type of object. This will always be Guide.
lastUpdatedAt LONG Epoch timestamp for when the guide was last updated in milliseconds.
createdAt LONG Epoch timestamp for when the guide was created in milliseconds.
state STRING The visibility state of the guide: draft, staged, public, or disabled.
name STRING The name given to the Guide.
emailState STRING The state of email backup for NPS: draft when disabled, and public when enabled.
launchMethod STRING The set of launch methods a guide might use, delineated by a hyphen.
isMultiStep BOOLEAN Whether a guide has more than one step.
isTraining BOOLEAN Whether the guide belongs to an "Adopt for Partners" end-user application.
recurrence LONG The recurrence period for an NPS guide in milliseconds.
recurrenceEligibilityWindow LONG The length of time in milliseconds for which an individual visitor is eligible for an NPS guide when even distribution is enabled.
attributeJson STRING JSON representation of guide attributes, including the type of guide, the badge description, the types of devices the guide is enabled for, and the last version of the Visual Design Studio that the guide was edited on.
audience STRING The logic defining the visitors targeted by the guide.
audienceUiHint STRING A more human-readable representation of the segment that was applied to the guide.
resetAt LONG The timestamp for when the guide was last reset.
publishedAt LONG The timestamp for when the guide was most recently published.
steps RECORD Guide steps containing STRING values for guideStepId, name, pageId, appRelayUrl, and elements.

At this time, we don't send poll questions in the guide object schema. 

Visitors file schema

Our visitors file schema defines the structure and format of the visitor data that we export in avro files for Data Sync visitor exports.

Metric Type Description
id STRING Unique identifier for the visitor.
accountids ARRAY List of STRING values that are unique Account IDs to which the visitor belongs.
accountid STRING The Account ID las associated with the visitor.
lastservername STRING The most recent server name.
firstvistMS LONG The timestamp in milliseconds when an event was first captured for the visitor.
idhash INT The hash of the Visitor ID.
lastbrowsername STRING The most recent browser name.
lastbrowserversion STRING The most recent browser version.
lastoperatingsystem STRING The most recent operating system.
lastvisitMS LONG The timestamp in milliseconds when an event was last recorded for the visitor.
lastupdatedMS LONG The timestamp in milliseconds when the visitor was last updated.
lastuseragent STRING The most recent user agent (unparsed).
identifiedvisitoratMS LONG If an anonymous Visitor ID is merged with an identified Visitor ID through Pendo identity management, this field shows the timestamp in milliseconds for when the visitor was identified.
identifiedvisitorid STRING If an anonymous Visitor ID is merged with an identified Visitor ID through Pendo identity management, this field shows the identified Visitor ID. This allows for a unified view of visitor journeys.
<fieldname>_<APP>_<ID> Varies depending on the field: STRING, LONG, or INT If in a multi-app subscription, Pendo sends <fieldname>_<APP>_<ID> values for any of the above metadata fields, which might vary across applications. Metadata fields for which this is possible have isPerAPP set to true in the metadata schema file.
agent_<fieldname>

Varies depending on the field. 

A field is created in the visitor avro file for all agent metadata fields. Field types depends on how you set up your data in the Data Mappings page in Pendo. For a mapping of Pendo metadata field types to avro, types, see Pendo to avro data mappings.

<metadata_group> STRING For any metadata group other than agent, we send a JSON representation of the metadata group and all its fields. The metadata schema file contains the name, type, and other information for each field in the metadata group.
incompleteFields List of STRINGs List of metadata fields with values that were incompatible with the type specified in your metadata schema.

Account file schema

Our accounts file schema defines the structure and format of the account data that we export in avro files for Data Sync account exports.

Metric Type Description
id STRING Unique identifier for the account.
auto.id STRING The same unique identifier as the id.
firstvistMS LONG The timestamp in milliseconds when an event was first captured for the account.
idhash INT The hash of the Account ID.
lastvisitMS LONG The timestamp in milliseconds when an event was last recorded for the account.
lastupdatedMS LONG The timestamp in milliseconds when the account was last updated.
<fieldname>_<APP>_<ID> Varies depending on the field: STRING, LONG, or INT If in a multi-app subscription, Pendo sends <fieldname>_<APP>_<ID> values for any of the above metadata fields, which might vary across applications. Metadata fields for which this is possible have isPerAPP set to true in the metadata schema file.
agent_<fieldname> Varies depending on the field.  A field is created in the visitor avro file for all agent metadata fields. Field types depends on how you set up your data in the Data Mappings page in Pendo. For a mapping of Pendo metadata field types to avro, types, see Pendo to avro data mappings.
<metadata_group> STRING For any metadata group other than agent, we send a JSON representation of the metadata group and all its fields. The metadata schema file contains the name, type, and other information for each field in the metadata group.
incompleteFields List of STRINGs List of metadata fields with values that were incompatible with the type specified in your metadata schema.

Metadata schema file

Metric Type Description
avroFieldName STRING The name of the field in the visitor avro file.
name STRING The name of the metadata field in Pendo.
group STRING The name of the group in the metadata field in Pendo.
displayName STRING The display name of the metadata field in Pendo.
type STRING The data type of the field. Options include string, int, float, boolean, time, or list.
elementType STRING Of type is a list, the data type of each element. Otherwise, this is empty.
elementformat STRING The data format of the metadata field. For example, if type is time, then elementformat might be milliseconds.
isDeleted BOOLEAN True if the field has been deleted.
isPerApp BOOLEAN True if the field can exist for each application. This is only possible for multi-app subscriptions.

Pendo to avro data mappings

Field type in Pendo Data Mappings  Avro type
Text (string) STRING
Number (int) INT
Number (float) FLOAT
Boolean (boolean) BOOLEAN
Date (time) LONG (with logical type of milliseconds)
List (list) ARRAY

Example

The events file schema applies to all files for each of the event types: All Events, Pages, Features, and Track Events. There's no Guide-specific file. Instead, guide events are included in the All Events file, as well as the relevant Pages, Features, and Track Events files.

For example, if "Guide Y" is launched on "Page X", a guideActivity event would be present in the All Events file and in the event stream for Page X with guideId set to Y. If an event isn't a guideActivity event, the fields associated with guide events are blank.

Additionally, there can be more than one event file for each type of event. For example, if your application has three tagged Pages, two tagged Features, and two defined Track Events, you would receive 12 avro files for each export:

  • Guide definitions (allguides.avro)
  • Page definitions (allpages.avro) 
  • Feature definitions (allfeatures.avro)
  • Track Event definitions (alltracktypes.avro)
  • All Events file (allevents.avro)
  • Three Page event files (page1.avro, page2.avro, page3.avro)
  • Two Feature event files (feature1.avro, feature2.avro)
  • Two Track Event files (tracktype1.avro, tracktype2.avro)

The file names in this list are for illustrative purposes. The Page, Feature, and Track Event file names would reflect the appropriate ID that can be found in the billofmaterials.json for each export. 

Was this article helpful?
2 out of 3 found this helpful