This article explains the architecture and data flow for syncing Pendo data to Snowflake. It includes an overview of the infrastructure, transfer process, and stored procedures that power the integration. For help setting up the destination, see Set up Pendo Data Sync to Snowflake.
Overview
Pendo Data Sync enables you to sync data tables directly from your Pendo subscription to your Snowflake account, with no developer resources required. This gives you full ownership over your data, including how it's stored, queried, and analyzed.
Because data is stored in your Snowflake environment, you're responsible for all Snowflake storage and any costs associated with processing the data.
Data Sync to Snowflake offers the same data and uses the same underlying data infrastructure as Data Sync to cloud storage.
Required permissions and roles
The following entity creation and permission-granting operations are required. Pendo provides a SQL script to help facilitate the following actions:
- Create a user that is associated with a public key generated by Pendo.
- Create a warehouse and database to be used exclusively by Pendo.
- Create a role that is granted the following permissions in Snowflake.
The different permissions and their purposes can be summarized below:
| Object | Permission required | Notes |
| Account permissions | CREATE INTEGRATION | Permission used to CREATE STORAGE INTEGRATION. This can be revoked after the Snowflake destination has been created. |
| Database permissions (in database created for Pendo data) | USAGE | |
| MONITOR | ||
| CREATE SCHEMA |
Pendo will have OWNERSHIP privileges to Pendo-created schemas and any objects within the schema we create. For more information, see the section Snowflake objects created by Pendo in the article Data Sync to Snowflake architecture. CREATE SCHEMA privileges can be revoked after all desired syncs have been created. CREATE SCHEMA will be required to create additional syncs. |
|
| Warehouse permissions (in data warehouse created for Pendo data) | USAGE | |
| MONITOR | ||
| Schema permissions (in schema created for Pendo data) | OWNERSHIP | Granted when the schema is created by Pendo. Required only during schema creation. |
| USAGE | Required for ongoing syncs after schema creation. OWNERSHIP can be revoked and replaced with USAGE once the schema exists. |
When a schema is created, PENDO_TRANSFER_ROLE is granted OWNERSHIP privileges. For ongoing syncs, the role only requires USAGE permissions on the schema, so you can revoke OWNERSHIP privileges after the schema is created.
Available data types
After a Snowflake destination is configured, you can begin to sync event data and account and visitor metadata.
- Event data is configured at the application level (you can choose which applications to include).
- Account and visitor metadata is optional and configured at the subscription level.
For more information, see Data Sync schema definitions. See the Data Sync to Snowflake ERD for more information on the standard warehouse table schema. To access the ERD, you need to provide the password: datasync. If you have issues with access, contact Pendo support.
Data flow from Pendo to Snowflake
Pendo stores captured data in a unique location for each customer. When Data Sync is implemented to any destination, Avro files are created for all requested data and loaded into a transitory storage location in Pendo GCP. For Snowflake destinations, data is loaded from a Pendo-owned GCS bucket with a 14-day retention policy into Snowflake using a series of Snowflake stored procedures.
These stored procedures are detailed in the diagram below and are also visible from within your Snowflake account.
Snowflake objects created by Pendo
Event objects
When you activate a transfer of event data for a specific application, Pendo creates a schema named SUB_{SUBID}_APP_{APPID} in the specified database. Each application synced from Pendo to Snowflake will have their own unique schema in Snowflake. Within each application-specific schema, Pendo creates the following tables:
AllEventsFeaturesPagesGuidesTrackTypesMatchedFeatureEventsMatchedPagesEventsMatchedTrackTypeEvents
These stored procedures are created in your Snowflake account during setup. Each one loads and merges a specific type of data.
AllEvents_Load_And_MergeFeatures_Load_And_MergeGuides_Load_And_MergePages_Load_And_MergeTrackTypes_Load_And_MergeMatchedFeatures_Load_And_MergeMatchedPages_Load_And_MergeMatchedTrackTypes_Load_And_Merge
Other objects also created and loaded by Pendo:
- A file format named
Pendo_Avro_Format. - A stage named
Pendo_Stage.
Account and visitor objects
When you activate a transfer of visitor data, Pendo creates a schema named SUB_{SUBID}_VISITOR in the specified database. Within the schema, Pendo creates the following tables:
VisitorsVisitorMetadata
These stored procedures are created in your Snowflake account during setup. Each one loads and merges a specific type of data.
Visitors_Load_And_MergeVisitorsMetadata_Load_And_Merge
Similarly, when you activate a transfer of account data, Pendo creates a schema named SUB_{SUBID}_ACCOUNT in the specified database. Within the schema, Pendo creates the following tables:
AccountsAccountMetadata
These stored procedures are created in your Snowflake account during setup. Each one loads and merges a specific type of data.
Accounts_Load_And_MergeAccountsMetadata
Other objects also created and loaded by Pendo:
- A file format named
Pendo_Avro_Format. - A stage named
Pendo_Stage.
Retroactive processing and stale data
When you update the tagging rules for a Feature, Page, or Track Event in Pendo, previously matched events may no longer satisfy the new definition. Similarly, if you delete a business object, its associated matched-event rows remain in the table with no signal that they're invalid.
To address this, the Pendo retroactive processing job periodically re-confirms active matches by re-exporting them with an updated lastUpdatedAt timestamp. Use this timestamp to detect and filter out stale data.
Identify stale and valid matched events
Use the following queries to filter matched-event rows based on whether they reflect current or outdated matching rules. These examples use the MATCHEDFEATUREEVENTS table, but the same pattern applies to Pages and Track Events.
To find stale matched-event rows:
-- Feature events (same pattern applies for Pages and Track Events)
SELECT mfe.*
FROM MatchedFeatureEvents mfe
JOIN Features f ON f.featureId = mfe.matchableId
WHERE f.deletedAt IS NOT NULL
OR (f.softDeleteMethod = 2
AND (mfe.lastUpdatedAt IS NULL OR mfe.lastUpdatedAt < f.rulesUpdatedAt));To find valid matched-event rows:
SELECT mfe.*
FROM MatchedFeatureEvents mfe
JOIN Features f ON f.featureId = mfe.matchableId
WHERE f.deletedAt IS NULL
AND (f.softDeleteMethod = 1
OR (f.softDeleteMethod = 2 AND mfe.lastUpdatedAt >= f.rulesUpdatedAt));A matched event is considered valid when all of the following are true:
- The business object hasn't been deleted (
deletedAt IS NULL). - The business object uses the legacy soft delete method (
softDeleteMethod = 1), or the match was confirmed after the most recent rule change (lastUpdatedAt >= rulesUpdatedAt).
Rows where lastUpdatedAt is null predate the migration and aren't included in staleness evaluation.