Understanding How Correlated Connects With Cloud Data Warehouses

Correlated can connect to existing models in Cloud Data Warehouses so that you can continue to own your data and leverage your data teams effectively. Typically, our suggestion is for your data team to leverage views to transform your data into tables that Correlated can ingest. For more information on building views, check out the documentation for the data warehouse you are using:

Populating Account and User Dimensions in Correlated with Versioned Tables

Correlated populates Accounts and Users with dimensions that describe that Account and User based on the last known value. For example, an Account's "Geography" is the US right now, but it might have been France five years ago. In this case, the dimension value for the "Geography" dimension will be the US. Similarly, if you create a dimension that tracks total logins, the value for that dimension will be the last known number of logins.

A versioned table stores how dimension values change over time. You can create as many tables as you want to describe Accounts and Users. We recommend that you create two tables - one that contains Account data and one that contains User data. Creating multiple tables that represent Accounts and multiple tables that represent Users will cause edge cases where dimensions that live in different tables are updated at different times, but are used in the same Signal to trigger a Workflow. In certain cases, you'll end up triggering off an updated version of one dimension, and an outdated version of another dimension. If you store all the dimensions you care about for a User in a single table, Correlated will pull all those dimensions at the same time so that you avoid this issue. In practice, depending on your use case, this level of data divergence may not matter to you, in which case you can add as many tables as you want.

An Account Versioned Table must have a timestamp column that represents when the row of data was updated in the table, an Account ID that globally identifies the account across ALL data sources, and an arbitrary number of columns that represent dimensions. Here's an example table:

timestampaccountIdtotalLogins
2020-02-01org_111
2020-02-01org_210
2020-02-01org_38
2020-01-31org_110
2020-01-31org_29

A User Versioned Table will look exactly the same, but rather than an accountId, you'll include a userId. If you want to join users to accounts, you'll need to give Correlated a mapping of Accounts <-> Users, but it does not have to be in this table. Correlated only needs that mapping from one data source in order to carry it forward to other data sources.

In order to build dimensions from versioned tables, Correlated will take the last known value for each Account or User found in the versioned tables. These values are taken at a point in time and are not aggregated in any way. This also means that these values can not be aggregated at a future point within Correlated. As an example, you cannot sum up totalLogins in the users table and attach that as a dimension to an account.

Correlated pulls data from versioned tables incrementally, which means that your data will be pulled more often and will be more up to date.

Populating Account and User Dimensions in Correlated with Unversioned Tables

Correlated can also accept data from unversioned tables. This means that you do not include a timestamp that marks when a row was updated. Because there is no way to pull data from unversioned tables incrementally, Correlated will pull from unversioned tables less frequently.

There are two types of unversioned tables: tables where accounts and users are distinct (there is only one row for an account or user) and tables where accounts and users are not distinct.

We highly recommend that your unversioned tables contain distinct accounts and users only. For example, compared to the above example of a versioned Account table, your unversioned Account table would look like the below table:

accountIdtotalLogins
org_111
org_210
org_38

Notice that this table only contains one row per organization and only shows the most recent value for the totalLogins dimension. You can do the same thing with users, and again, like with versioned tables, you do not need to include the accountId to join users to accounts. Also, because there is no aggregation going on, dimensions cannot be aggregated at the account level.

Correlated does support unversioned tables where accounts and users are not distinct, and in this case, we do support aggregations. This is not recommended. The recommended path is for you to build a view that contains these aggregations and to instead go with an unversioned table with distinct account and user values.

If you want to use an unversioned table to build aggregated dimensions, you'll need to work directly with Correlated to do so. Let's go through a specific example to explain how this works. Let's say you have a table that stores the products that each account has purchased.

accountIdproductLine
org_1product_1
org_1product_2
org_2product_1

You want to create a dimension that represents the number of products purchased by an organization at this point in time. This would be a COUNT() of values in the productLine column, grouped by accountId. Correlated can support this by querying the entire table and generating a dimension for "Number of Products Purchased". In this example, the value for org_1 would be 2. The value for org_2 would be 1.

Similarly, Correlated can do the same thing for users. However, let's say that you have a table that contains users and the products that they have purchased. Just like with versioned tables, we cannot aggregate across these users to get an account dimension, even if you have given us an account <-> user mapping elsewhere in a different data source. Instead, you MUST provide us with an account versioned table that contains an account ID. We can then similarly COUNT() all values in the productLine column, grouped by accountId.

❗️

Creating aggregate dimensions on unversioned data sources is NOT RECOMMENDED

Again, Correlated can support aggregations on unversioned data sources, but this is not recommended. Instead, you should build a view that contains a distinct row for each account or user with the corresponding aggregate in a column.