Ingestion from Snowflake

This document shows how to ingest from a Snowflake table to a Tinybird data source using the CLI.

How it works

The Snowflake connector allows to run an arbitrary SQL query over a Snowflake table and ingest it to a Tinybird data source.

It currently supports unloading data into a Google Cloud Storage bucket and from there to a Tinybird data source, so you’ll need access to a Google Cloud project to store temporary files.

The Snowflake connector is provided as an extra package of the CLI, in order to install it run:

Install the Snowflake connector
pip install tinybird-cli[snowflake]

To start using the Snowflake connector you need to provide your credentials by running:

Configure the Snowflake connector
tb auth --connector snowflake

Snowflake Account (e.g. your-domain.west-europe.azure):
Snowflake warehouse name:
Snowflake database name:
Snowflake schema name:
Snowflake role name:
Snowflake user name:
Snowflake password:
Snowflake GCS storage integration name (leave empty to auto-generate one) []:
Snowflake GCS stage name (leave empty to auto-generate one) []:
Google Cloud project ID to store temporary files:
Path to a JSON service account file with permissions to write in Storagem, sign URLs and IAM (leave empty to use GOOGLE_APPLICATION_CREDENTIALS environment variable) []:
Name of a Google Cloud Storage bucket to store temporary exported files:
** snowflake configuration written to .tinyb_snowflake file, consider adding it to .gitignore

You’ll be prompted for your Google Cloud project id, a local path with a JSON service account and the name of a Google Cloud Storage bucket to store temporary files.

It’ll save the configuration in a local file named .tinyb_snowflake inside your data project.

Another option is passing those parameters on each call to the CLI, like this:

Configure the Snowflake connector on query time
tb \
    --sf-account sf_account_name \
    --sf-warehouse sf_warehouse_name \
    --sf-database sf_database_name \
    --sf-schema sf_schema_name \
    --sf-role sf_role_name \
    --sf-user sf_user_name \
    --sf-password sf_password \
    --sf-storage-integration sf_storage_integration_name \
    --sf-stage sf_stage_name \
    --gcp-project-id projectid \
    --gcs-bucket bucket_name \
    --google-application-credentials service_account.json \
    datasource append <...>

See this guide to learn how to configure a GCS storage integration in Snowflake. Otherwise, if the Snowflake user passed as a parameter to the connector has the right permission, the CLI will try to create the storage integration automatically. In that case, Google service account provided must have IAM management permissions.

Note: If not passed as a parameter the CLI will look for a service account in the GOOGLE_APPLICATION_CREDENTIALS environment variable.

Append to a Data Source

To extract data from a Snowflake table and ingest it into a Tinybird data source run:

Extract from Snowflake and append to a data source
tb \
    --gcp-project-id projectid \
    --gcs-bucket bucket_name \
    --sf-account sf_account_name \
    --sf-warehouse sf_warehouse_name \
    --sf-database sf_database_name \
    --sf-schema sf_schema_name \
    --sf-role sf_role_name \
    --sf-user sf_user_name \
    --sf-password sf_password \
    --sf-storage-integration sf_storage_integration_name \
    --sf-stage sf_stage_name \
    datasource append tinybird_datasource_name \
    --connector snowflake \
    --sql "select * from snoflake_table_name"

This command is useful for bulk upload of data, but sometimes you want to do incremental updates to keep your Tinybird data source synchronized with your Snowflake table.

In order to do that, you usually take the maximum insertion date from a Tinybird data source, extract from Snowflake from that date, and append to the Tinybird data source. You can run this whole workflow with this command:

Extract from Snowflake and append to a data source
tb \
    --gcp-project-id projectid \
    --gcs-bucket bucket_name \
    --sf-account sf_account_name \
    --sf-warehouse sf_warehouse_name \
    --sf-database sf_database_name \
    --sf-schema sf_schema_name \
    --sf-role sf_role_name \
    --sf-user sf_user_name \
    --sf-password sf_password \
    --sf-storage-integration sf_storage_integration_name \
    --sf-stage sf_stage_name \
    datasource append tinybird_datasource_name \
    --connector snowflake \
    --sql "select date, recipient_country, recipient_code, id from snowflake_table_name"
    --incremental "date" # where <date> is the name of a date or timestamp column in the origin Snowflake table

Replace a Data Source

To fully replace the data in a data source, run:

Extract from Snowflake and fully replace the data in a data source
tb \
    --gcp-project-id projectid \
    --gcs-bucket tinybird \
    --sf-account sf_account_name \
    --sf-warehouse sf_warehouse_name \
    --sf-database sf_database_name \
    --sf-schema sf_schema_name \
    --sf-role sf_role_name \
    --sf-user sf_user_name \
    --sf-password sf_password \
    --sf-storage-integration sf_storage_integration_name \
    --sf-stage sf_stage_name \
    datasource replace tinybird_datasource_name \
    --connector snowflake \
    --sql "select * from snowflake_table"

You can also replace the data partially. For instance, if you want to replace the last month of data you can run this:

Extract from Snowflake and replace the data in a data source
NOW=`date +"%Y-%m-%d"`
ONE_MONTH=`date --date="${NOW} -1 month" +%Y-%m-%d`

tb \
    --gcp-project-id projectid \
    --gcs-bucket tinybird \
    --sf-account sf_account_name \
    --sf-warehouse sf_warehouse_name \
    --sf-database sf_database_name \
    --sf-schema sf_schema_name \
    --sf-role sf_role_name \
    --sf-user sf_user_name \
    --sf-password sf_password \
    --sf-storage-integration sf_storage_integration_name \
    --sf-stage sf_stage_name \
    datasource replace tinybird_datasource_name \
    --connector snowflake \
    --sql "select date, recipient_country, recipient_code, id from snowflake_table WHERE date BETWEEN TO_DATE('${ONE_MONTH}','YYYY-MM-DD') AND TO_DATE('${NOW}','YYYY-MM-DD')" \
    --sql-condition "date BETWEEN toDate('${ONE_MONTH}') AND toDate('${NOW}')