Pendo Data Sync allows you to push data out of Pendo into your data lake or warehouse and business intelligence (BI) tools so that you can perform complex queries and analyses. With Pendo data blended with other key data sources you can:
- Measure the impact of product improvements on sales and renewals.
- Identify friction in your end-to-end user journey, from top-of-funnel acquisition to in-product activation.
- Calculate a comprehensive customer health score across data sources to shape renewal strategy.
- Create a churn-risk model from a foundation of product usage and sentiment signals.
- Identify up-sell and cross-sell opportunities to drive data-informed account growth.
Prerequisites
Data Sync is a paid feature. Contact your Pendo representative for access. You can also try Data Sync on a subset of data first. For information, see Test export in this article.
You must be a subscription admin in Pendo to set up Data Sync.
How it works
Data Sync first involves choosing one of the following cloud storage services to set up as a destination for your Pendo data: Google Cloud Storage, Amazon S3, or Microsoft Azure Storage. Data Sync passes every event captured in Pendo in Avro file format to your chosen cloud storage destination.
Set up recurring daily exports and backfill up to three calendar years of historical data. For example, if you create an export on December 31, 2024, the export can extend back to January 1, 2021. Pendo also automatically sends updated files whenever a Page or Feature rule is added or updated in Pendo.
You then create an ETL pipeline to move data from your cloud storage service into a data lake or warehouse, such as Snowflake, Databricks, Google BigQuery, or Amazon Redshift. The following documentation provides guidance on setting up your Data Sync ETL pipeline:
After data is loaded into your data lake or warehouse, your analytics team can blend it with other data sources and push those insights into your Business Intelligence (BI) source-of-truth, such as Looker, Tableau, Power BI, or Mode.
ETL for Data Sync
Extract, transform, and load (ETL) is a standard way for organizations to combine data from multiple systems into a data warehouse or data lake. For an overview of ETL and how it works, see the Google Cloud article: What is ETL?
For a typical Data Sync implementation, a data engineer at your organization creates an ETL pipeline that listens for new Pendo exports appearing in your storage bucket. Using those files, the automated process built by your engineer creates and updates tables in your data lake or warehouse (herein, warehouse).
For an example of an ETL script that shows how to pull Data Sync exports from Google Cloud Storage (GCS) into Google BigQuery, see data-sync-gcp-export-loading-example.
The ETL process must handle new data as well as update existing data when event data is finalized and when Page and Feature rules are created or updated in Pendo. Track Events aren't subject to retroactive processing.
If your data engineer isn't familiar with Pendo data, we recommend that your engineer works with someone in your organization who understands data in Pendo. For information about Pendo event data and metadata, see Events overview and Configure visitor and account metadata.
To establish a helpful, long-term foundational table structure, we also recommend that your data engineer works with someone who understands the end goals of the implementation.
Some data engineers prefer an extract, load, and transform (ELT) process over the ETL process for data integration. ELT involves first loading data into a data warehouse before it's transformed into the final data structure. Either ETL or ELT is a viable option, so long as the process properly accommodates the data additions and updates detailed in this article.
As part of this process, we recommend:
- Listening for updates (new Pendo exports) hourly.
- Ensuring files are loaded in parallel, not one at a time.
- Ensuring that partitioning matches our export pattern (UTC days).
Get started
Data Sync sends avro files to your cloud storage container, where they can be picked up and loaded into your data lake or warehouse. Broadly, setting up Data Sync involves the following tasks:
Step 1. Create and manage destinations
Setting up Data Sync involves configuring a cloud storage destination and defining this in Pendo. You can set up one cloud storage destination for each Pendo subscription.
The exact process and prerequisites depend on the destination you choose: Data Sync supports Google Cloud Storage (GCS), Amazon Simple Storage Service (S3), or Microsoft Azure Storage. For instructions, see the following articles:
- Set up Data Sync with Google Cloud
- Set up Data Sync with Amazon S3 using user access keys
- Set up Data Sync with Amazon S3 using IAM roles
- Set up Data Sync with Microsoft Azure Storage
Alternatively, you can set up a Pendo-hosted bucket in GCS as your destination. For more information, see Set up Data Sync with a Pendo-hosted Google Cloud Storage (GCS) destination.
After you’ve configured a destination, you can find and optionally update the bucket path by selecting Manage destination from the Data Sync page.
Step 2. Create exports
After you've set up a destination, you can create:
- Event exports. These are configured and run at the application level. They can be one-time exports or recurring exports. For more information and instructions, see Data Sync event export handling.
- Visitor and account metadata exports. These are configured and run at the subscription level. They can be one-time exports or ongoing metadata updates. For more information and instructions, see Data Sync account and visitor export handling.
If you're not yet a Data Sync customer, you can instead create a single test export.
Test exports
You can create a single test export containing one day of Pendo event data so that your data engineering team can see how your Pendo data appears in Data Sync Avro files and plan the ETL pipeline required to pull Pendo data from your cloud storage.
Go to Settings > Data Sync and start the setup process for one of the supported cloud storage services. After setting up a destination, create an export where you can then choose the Test export option.