Data Sync schema definitions

Last updated:

Pendo Data Sync allows you to export avro files to Google Cloud Storage (GCS) bucket, Amazon S3, or Microsoft Azure Storage. We use the avro file type because it's well-suited for use in "extract, transform, load" (ETL) pipelines. This article describes the avro files that are exported as part of the Data Sync process and the values contained in each type of avro file.

Data sync exports

Exports include event files and event type definitions, as well as contextual information so that you can get full value out of the events without having to do additional lookups or API calls. To this end, exports contain:

  • An event file for all events, even those your team has yet to define.
  • An event file for each defined event type ("matchable"): Page, Feature, and Track Event. 
  • A definition file for each event type, plus Guide Event, including metadata and details about each defined event.
  • An export manifest that references all the files outlined above. This is to help you load data from your cloud storage into your data warehouse.

Dates and timestamps are in Coordinated Universal Time (UTC).

Events file schema

Our events file schema defines the structure and format of the events data we export in avro files for Data Sync. The schema applies to all types of event file:

  • All Events
  • Page
  • Feature
  • Track Event

Guide events are contained within the above event-type files, and so don't have their own event file, but do have their own definition file.

The schema definition that follows includes information about the different types of events that can be generated, the fields that are associated with each event type, and the data types that are used to define each field.

Metric Type Description
matchableId STRING Reference to the matchable (Page, Feature, or Track Event) that has a rule matching this event. Not present in the All Events file.
periodId DATE Convenience field to assist in data warehouse loading, equal to the date portion of the browserTimestamp for a given event. All dates are in UTC and in the following format: YYYY-MM-DD.
visitorId STRING The Visitor ID for the event.
accountId STRING The Account ID for the event. An empty string is used when no account information is available.
browserTimestamp LONG Timestamp of the event. These can be loaded into a data warehouse as dates with the use_avro_logical_types flag.
country STRING Country associated with the remoteIp. This field can be blank.
destinationStepId STRING Relates to guideAdvanced events that specify the ID of the destination step in the guide showing flow. This can be the previous or next step ID. This shows up in the singleEvents and guideEvents sources.
elementPath STRING For web events. This is either empty or a CSS-style string specifying the DOM element related to the event. For mobile, this is a JSON-formatted description of the widget related to this event.
eventClass STRING ui or track.
eventId STRING Unique event identifier.
eventSource STRING Event sources include: email, events where an actual email was how the event originated from; mobile, events originating in a mobile app; web, events originating in a web page.
eventType STRING Type of event. Examples include: change, indicating that a visitor has changed an element in the app; click, indicating that a visitor has clicked on an element in the app; focus, indicating that a visitor has focused on an element in the app; group, an event in response to the "group" call with a Twilio Segment integration; guideActivity, indicating that a visitor has interacted with a Pendo guide; guideDismissed, indicating that a visitor has dismissed a Pendo guide; guideSeen, indicating that a visitor has seen a Pendo guide; identify, indicating that the app identified the visitor or account identity; load, indicating that a page was loaded; meta, indicating that the app sent visitor and account metadata; and pollResponse, indicating that a visitor submitted a Pendo poll.
guideId STRING Unique identifier of the guide generating guide- and poll-related events. This field can be blank.
guideSeenReason STRING For the guideSeen event type, why the guide was displayed to the user.
guideSeenTimeoutMS

LONG

For guideTimeout events, the amount of time that the agent waited to show a specific guide step before sending a guideTimeout event.
guideSessionId STRING Identifiers of the list of guides and other deliverables that were loaded together. This ID changes every time guides are requested by the client and events that happen between each load carry the same ID.
guideSnoozeDurationMS LONG For the guideSnoozed event type, the amount of time the guide was snoozed for in milliseconds.
guideStepId STRING Unique identifier of the guide step generating the guide- or poll-related event. This field can be blank.
language STRING For the guideSeen event type, the language of the guide being displayed. This field can be blank.
latitude FLOAT The latitude of the remoteIp for the event. This field can be blank.
longitude FLOAT The longitude of the remoteIp for the event. This field can be blank.
oldVisitorId STRING The anonymous visitor ID that was previously assigned to the visitor before they were identified by Pendo through authentication.
loadDurationMS LONG For load events, the amount of time it took for the webpage to render in milliseconds. This doesn't include the time it takes for dynamic parts of the page to load.
pollId STRING The identifier of the poll that generated any pollResponse events.
pollResponse STRING The JSON-formatted response to the poll that generated a pollResponse event. This can be an index into data that's only available in the poll itself. If the eventType isn't pollResponse, there is no pollResponse property.
pollType STRING The type of poll that generated a pollResponse event. This could be NumberScale, PositiveNegative, FreeForm, or PickList.   
propertiesJson STRING A JSON-formatted map of all the user-defined event properties for this event. If historical metadata has been promoted to the visitor or the account, this JSON includes a vistormetdata and vistormetdata string, respectively.
region STRING The United States region, presented as a two-letter code for the state, of the remoteIp for the event. This field can be blank.
remoteIp STRING The remoteIp that generated the event. If not collecting this data, 0.0.0.0 is stored instead. This can also be an ipv6 address. Some proxies and mobile networks prevent useful IP addresses being collected.
server STRING The server name portion of the URL for the event. This field can be blank.
uiElementActions STRING The element actions, such as openLink or guideSnoozed, associated with a guide interaction where a guideActivity event is sent by the agent. This appears in the following sources: singleEvents and guideEvents.
uiElementId STRING The guide element's unique identifier when a UI element inside a guide is clicked on, which sends a guideActivity event to the agent. This appears in the following sources: singleEvents, guideEvents, guideElementClick, and guideElementClickEver.
uiElementText STRING The guide element's text for when the agent sends the field as part of a guideActivity event. When a guideActivity event is sent with the guide element's text, ui_element_text should appear in the following sources: singleEvents and guideEvents. If a guideActivity event is sent without the ui_element_text field, we still process it. If a subscription has opted to exclude all text, ui_element_text isn't stored, even if the event is sent with it.
uiElementType STRING The type of element that was clicked when a guideActivity event is being sent. This appears in the following sources: singleEvents and guideEvents.
url STRING The normalized URL for the page that generated a web event. For mobile events, the URL is a JSON representation of the screen structure.
userAgent STRING The user agent from the HTTPS request when a web event is received. For mobile events, the userAgent is the textual representation of the device type that generated the event. Both types of value are properly parsed by the user agent parsing functions with aggregations.

Event type definitions

The definitions for Pages, Features, and Track Events are included in their own avro files, which can be added to the event data contained in the relevant event file.

Pages

Metric Type Description
pageId STRING Page identifier.
kind STRING Description of the type of object. This will always be Page.
lastUpdatedAt LONG Epoch timestamp for when the Page was last updated in milliseconds.
createdAt LONG Epoch timestamp for when the Page was created in milliseconds.
rulesJson STRING The regex rules that define the Page if created by Pendo Classic (legacy) Designer.
name STRING The name given to the Page.
isCoreEvent BOOLEAN Whether the event is a Pendo Core Event.

Features

Metric Type Description
featureId STRING Feature identifier (unique per subscription).
kind STRING Description of the type of object. This will always be Feature.
lastUpdatedAt LONG Epoch timestamp for when the Feature was last updated in milliseconds.
createdAt LONG Epoch timestamp for when the Feature was created in milliseconds.
pageId STRING The Page identifier containing the Feature.
name STRING The name given to the Feature.
isCoreEvent BOOLEAN Whether the event is a Pendo Core Event.

Track Events

Metric Type Description
trackTypeId STRING Track Event identifier (unique per subscription).
kind STRING Description of the type of object. This will always be TrackType.
lastUpdatedAt LONG Epoch timestamp for when the Track Event was last updated in milliseconds.
createdAt LONG Epoch timestamp for when the Track Event was created in milliseconds.
eventPropertyNames STRING The names of the Track Event properties included.
name STRING The name given to the Track Event.
isCoreEvent BOOLEAN Whether the event is a Pendo Core Event.

Guides

Metric Type Description
guideId STRING Guide identifier (unique per subscription).
kind STRING Description of the type of object. This will always be Guide.
lastUpdatedAt LONG Epoch timestamp for when the guide was last updated in milliseconds.
createdAt LONG Epoch timestamp for when the guide was created in milliseconds.
state STRING The visibility state of the guide: draft, staged, public, or disabled.
name STRING The name given to the Guide.
emailState STRING The state of email backup for NPS: draft when disabled, and public when enabled.
launchMethod STRING The set of launch methods a guide might use, delineated by a hyphen.
isMultiStep BOOLEAN Whether a guide has more than one step.
isTraining BOOLEAN Whether the guide belongs to an "Adopt for Partners" end-user application.
recurrence LONG The recurrence period for an NPS guide in milliseconds.
recurrenceEligibilityWindow LONG The length of time in milliseconds for which an individual visitor is eligible for an NPS guide when even distribution is enabled.
attributeJson STRING JSON representation of guide attributes, including the type of guide, the badge description, the types of devices the guide is enabled for, and the last version of the Visual Design Studio that the guide was edited on.
audience STRING The logic defining the visitors targeted by the guide.
audienceUiHint STRING A more human-readable representation of the segment that was applied to the guide.
resetAt LONG The timestamp for when the guide was last reset.
publishedAt LONG The timestamp for when the guide was most recently published.
steps RECORD Guide steps containing STRING values for guideStepId, name, pageId, appRelayUrl, and elements.

At this time, we don't send poll questions in the guide object schema. 

Example

The events file schema applies to all files for each of the event types: All Events, Pages, Features, and Track Events. There's no Guide-specific file. Instead, guide events are included in the All Events file, as well as the relevant Pages, Features, and Track Events files.

For example, if "Guide Y" is launched on "Page X", a guideActivity event would be present in the All Events file and in the event stream for Page X with guideId set to Y. If an event is not a guideActivity event, the fields associated with guide events are blank.

Additionally, there can be more than one event file for each type of event. For example, if your application has three tagged Pages, two tagged Features, and two defined Track Events, you would receive 12 avro files per export:

  • Guide definitions (allguides.avro)
  • Page definitions (allpages.avro) 
  • Feature definitions (allfeatures.avro)
  • Track Event definitions (alltracktypes.avro)
  • All Events file (allevents.avro)
  • Three Page event files (page1.avro, page2.avro, page3.avro)
  • Two Feature event files (feature1.avro, feature2.avro)
  • Two Track Event files (tracktype1.avro, tracktype2.avro)

The file names in this list are for illustrative purposes. The Page, Feature, and Track Event file names would reflect the appropriate ID that can be found in the billofmaterials.json for each export. 

Was this article helpful?
1 out of 2 found this helpful