Overview of Pendo Data Sync

Last updated:

Pendo Data Sync allows you to push data out of Pendo into your data lake or warehouse and business intelligence (BI) tools so that you can perform complex queries and analyses. With Pendo data blended with other key data sources you can:

  • Measure the impact of product improvements on sales and renewals.
  • Identify friction in your end-to-end user journey, from top-of-funnel acquisition to in-product activation.
  • Calculate a comprehensive customer health score across data sources to shape renewal strategy.
  • Create a churn-risk model from a foundation of product usage and sentiment signals.
  • Identify up-sell and cross-sell opportunities to drive data-informed account growth.

Prerequisites

Data Sync is a paid feature. Contact your Pendo representative for access. You can also try Data Sync on a subset of data first. For information, see Test export in this article.

You must be a subscription admin in Pendo to set up Data Sync.

How it works

Screenshot 2023-12-13 at 16.56.27.png

Data Sync first involves choosing one of the following cloud storage services to set up as a destination for your Pendo data: Google Cloud Storage, Amazon S3, or Microsoft Azure Storage. Data Sync passes every event captured in Pendo in Avro file format to your chosen cloud storage destination.

Set up recurring daily exports and backfill up to three calendar years of historical data. For example, if you create an export on December 31, 2024, the export can extend back to January 1, 2021. Pendo also automatically sends updated files whenever a Page or Feature rule is added or updated in Pendo.

You then create an ETL pipeline to move data from your cloud storage service into a data lake or warehouse, such as Snowflake, Databricks, Google BigQuery, or Amazon Redshift. The following documentation provides guidance on setting up your Data Sync ETL pipeline:

After data is loaded into your data lake or warehouse, your analytics team can blend it with other data sources and push those insights into your Business Intelligence (BI) source-of-truth, such as Looker, Tableau, Power BI, or Mode.

ETL for Data Sync

Extract, transform, and load (ETL) is a standard way for organizations to combine data from multiple systems into a data warehouse or data lake. For an overview of ETL and how it works, see the Google Cloud article: What is ETL?

For a typical Data Sync implementation, a data engineer at your organization creates an ETL pipeline that listens for new Pendo exports appearing in your storage bucket. Using those files, the automated process built by your engineer creates and updates tables in your data lake or warehouse (herein, warehouse).

For an example of an ETL script that shows how to pull Data Sync exports from Google Cloud Storage (GCS) into Google BigQuery, see data-sync-gcp-export-loading-example.

The ETL process must handle new data as well as update existing data when event data is finalized and when Page and Feature rules are created or updated in Pendo. Track Events aren't subject to retroactive processing.

If your data engineer isn't familiar with Pendo data, we recommend that your engineer works with someone in your organization who understands data in Pendo. For information about Pendo event data and metadata, see Events overview and Configure visitor and account metadata.

To establish a helpful, long-term foundational table structure, we also recommend that your data engineer works with someone who understands the end goals of the implementation.

Some data engineers prefer an extract, load, and transform (ELT) process over the ETL process for data integration. ELT involves first loading data into a data warehouse before it's transformed into the final data structure. Either ETL or ELT is a viable option, so long as the process properly accommodates the data additions and updates detailed in this article.

As part of this process, we recommend:

  • Listening for updates (new Pendo exports) hourly.
  • Ensuring files are loaded in parallel, not one at a time.
  • Ensuring that partitioning matches our export pattern (UTC days).

Setup

Data Sync sends Avro files to your cloud storage container, where they can be picked up and loaded into your data lake or warehouse. You can only set up one cloud storage destination per Pendo subscription. Data Sync supports Google Cloud Storage (GCS), Amazon Simple Storage Service (S3), or Microsoft Azure Storage. For instructions, see the following articles:

Export creation

After you've set up a destination you can create a daily Recurring export or a historical One-time export. For more information and instructions on creating an export, see Data Sync export handling.

If you're not yet a Data Sync customer, you can instead create a single test export.

Test export

You can create a single test export containing one day of Pendo data so that your data engineering team can see how your Pendo data appears in Data Sync Avro files and plan the ETL pipeline required to pull Pendo data from your cloud storage.

Go to Settings > Data Sync and start the setup process for one of the three supported cloud storage services. After setting up a destination, you can create an export where you can choose the Test export option.

Was this article helpful?
6 out of 8 found this helpful