Overview of Pendo Data Sync

Last updated:

Pendo Data Sync allows you to push data out of Pendo and centralize it in a data lake or warehouse for additional analysis. With Pendo Data Sync fully implemented and connected to other data sources, you can:

  • Use Pendo data in your business intelligence (BI) visualizations and dashboards.
  • Measure the impact of product enhancements and guide campaigns on sales and renewals.
  • Calculate a comprehensive customer health score to shape your renewal strategy.
  • Create a custom churn-risk model from a foundation of product usage and sentiment signals.
  • Identify upsell and cross-sell opportunities to drive data-informed account growth.

Prerequisites

You must be a subscription admin in Pendo to set up Data Sync. Contact your Pendo representative for access to Data Sync. 

You can also try Data Sync on a subset of data first. For information, see the Test exports section in this article.

How it works

Data Sync can deliver Pendo data in one of two ways:

  1. Data tables synced directly to a warehouse destination (Snowflake).
  2. Avro files to a cloud storage destination (Amazon S3, Google Cloud Storage, or Azure Storage).

After a destination is configured, you can begin to sync event data and account and visitor metadata.

  • Event data is configured at the application level (you can choose which applications to include).
  • Account and visitor metadata is optional and configured at the subscription level.

Regardless of your data destination type, Data Sync includes the same underlying dataset. See Data Sync schema definitions for more information about the data available in Data Sync.

Warehouse destinations

If you choose a warehouse destination such as Snowflake, Data Sync creates and syncs tables into your warehouse. For more information, see the articles Set up Data Sync to Snowflake and Data Sync to Snowflake architecture.

Once your warehouse destination is configured, you can choose which applications you want to sync from Pendo to your warehouse.

You can also choose to sync accounts and visitors to your warehouse. If activated, Pendo automatically creates tables of accounts, account metadata, visitors, and visitor metadata and syncs updates on a nightly basis.

See the Data Sync to Snowflake ERD for more information on the standard warehouse table schema. To access the ERD, you'll need to provide the password: datasync. If you have issues with access, contact support.

Cloud storage destinations

You can use cloud storage services like Amazon S3, Google Cloud Storage, or Microsoft Azure Storage as destinations for Data Sync exports. Data Sync sends Avro files to your cloud storage along with an export manifest JSON file that can be used for the custom ETL required to bring data from cloud storage to your data lake or warehouse. For setup instructions for the different cloud storage destinations, see the following articles:

Alternatively, you can pull data from a Pendo-hosted Google Cloud Storage (GCS) bucket. See Set up Data Sync with a Pendo-hosted Google Cloud Storage (GCS) destination for more information.

After setting up a cloud storage destination, you can:

  • Set up recurring daily exports.
  • Backfill up to three calendar years of historical data. For example, if you create an export on December 31, 2025, it can extend back to January 1, 2022.
  • Automatically receive updates when Page or Feature rules are added or updated in Pendo.

Create and manage exports

After you've set up a destination, you can create:

  • Event exports. These are configured and run at the application level. They can be one-time exports or recurring exports. For more information and instructions, see Data Sync event export handling.
  • Visitor and account metadata exports. These are configured and run at the subscription level. They can be one-time exports or ongoing metadata updates. For more information and instructions, see Data Sync account and visitor export handling.

If you're not yet a Data Sync customer, you can instead create a single test export.

ETL for Data Sync to Cloud Storage

A data engineer must create a pipeline to move files from cloud storage into your data platform. This can be either an ETL (extract, transform, and load) or ELT (extract, load, and transform) process. ELT involves first loading data into a data warehouse before it's transformed into the final data structure. Either ETL or ELT is a viable option as long as the pipeline supports the data additions and updates described in this article. The choice of using ETL or ELT depends on your organization's data architecture. For more information, see Google's article What is ETL?

For a typical Data Sync implementation, a data engineer at your organization creates an ETL pipeline that listens for new Pendo exports in your storage bucket. Using those files, the automated process creates and updates tables in your data lake or warehouse.

The pipeline must:

  • Process new data as it is exported.
  • Update existing records when data is finalized.
  • Accommodate retroactive changes when a Page or Feature rule is added or updated in Pendo.

If a data engineer isn't familiar with Pendo data, we recommend that they work with someone in your organization who understands data in Pendo. For information about Pendo event data and metadata, see Events overview and Configure visitor and account metadata.

To design a helpful, long-term foundational table structure, ensure that the data engineer works with someone who understands the end goals of the implementation.

As part of this process, we recommend the following when building your pipeline:

  • Listen for updates (new Pendo exports) hourly.
  • Load files in parallel, not sequentially.
  • Partition data by UTC day to match Data Sync's export pattern.

To set up your Data Sync ETL pipeline, see Data Sync event export handling.

Data estimation

If you’re not yet a Data Sync customer and want to understand how much data can be synced from Pendo, you can run a data estimate for your accounts, visitors, and application-level event data before purchasing Data Sync.

To run a data estimation:

  1. In Pendo, go to Settings > Data Sync.
  2. Select Run data estimate. This opens the Data to estimate window.
  3. Select which sources you want a data estimate for.
  4. Select Run estimate.

This starts the estimation process in the background and redirects you to the Data Sync page. The estimates may take a few minutes to complete. Pendo notifies you by email when the estimates are ready. 

For each application you select to estimate, you receive: 

  • A data volume estimate for the amount of data you can expect to receive in cloud storage for one day of event data.
  • A a data volume estimate for a 12-month backfill of historical data.

For accounts and visitors, you receive a data volume estimate of the full sync of account and visitor metadata.

If you're using Snowflake as your data destination, Pendo also estimates the data volume for Snowflake storage. Pendo estimates an approximate 50% rate of compression between cloud storage and Snowflake, but actual compression rates may vary.

Test exports

If you're not yet a Data Sync customer, you can create a single test export containing one day of Pendo event data so your organization's data engineering team can see how your Pendo data appears in Data Sync Avro files and plan the ETL pipeline required to pull Pendo data from your cloud storage.

Go to Settings > Data Sync and start the setup process for one of the supported cloud storage services. After you set up a destination, create an export where you can choose the Test export option.

Was this article helpful?
12 out of 14 found this helpful