Ingestion from BigQuery

This document shows how to ingest from a BigQuery table to a Tinybird data source using the CLI.

How it works

The BigQuery connector allows to run an arbitrary SQL query over a BigQuery table and ingest it to a Tinybird data source.

It uses the EXPORT DATA statement, to export the results of a query to a CSV file in a Google Cloud Storage bucket.

Note: You’ll be billed according to your BigQuery project pricing terms.

The BigQuery connector is provided as an extra package of the CLI, in order to install it run:

Install the BigQuery connector
pip install "tinybird-cli[bigquery]"

To start using the BigQuery connector you need to provide your credentials by running:

Configure the BigQuery connector
tb auth --connector bigquery

BigQuery project ID: projectid
Path to a JSON service account file with permissions to export from BigQuery, write in Storage and sign URLs (leave empty to use GOOGLE_APPLICATION_CREDENTIALS environment variable) []:
Name of a Google Cloud Storage bucket to store temporary exported files: bucket_name
** bigquery configuration written to .tinyb_bigquery file, consider adding it to .gitignore

You’ll be prompted for your BigQuery project id, a local path with a JSON service account and the name of a Google Cloud Storage bucket.

It’ll save the configuration in a local file named .tinyb_bigquery inside your data project.

Another option is passing those three parameters on each call to the CLI, like this:

Configure the BigQuery connector on query time
tb \
    --gcp-project-id projectid \
    --gcs-bucket bucket_name \
    --google-application-credentials service_account.json \
    datasource append <...>

Note: If not passed as a parameter the CLI will look for a service account in the GOOGLE_APPLICATION_CREDENTIALS environment variable.

Append to a Data Source

To extract data from a BigQuery table and ingest it into a Tinybird data source run:

Extract from BigQuery and append to a data source
tb \
    --gcp-project-id projectid \
    --gcs-bucket bucket_name \
    datasource append test__bigquery_datasource \
    --connector bigquery \
    --sql "select date, recipient_country, recipient_code, id from \`projectid.dataset.table\`"

This command is useful for bulk upload of data, but sometimes you want to do incremental updates to keep your Tinybird data source synchronized with your BigQuery table.

In order to do that, you usually take the maximum insertion date from a Tinybird data source, extract from BigQuery from that date, and append to the Tinybird data source. You can run this whole workflow with this command:

Extract from BigQuery and append to a data source
tb \
    --gcp-project-id projectid \
    --gcs-bucket bucket_name \
    datasource append test__bigquery_datasource \
    --connector bigquery \
    --sql "select date, recipient_country, recipient_code, id from \`projectid.dataset.table\`"
    --incremental "date" # where <date> is the name of a date or timestamp column in the origin BigQuery table

Replace a Data Source

To fully replace the data in a data source, run:

Extract from BigQuery and fully replace the data in a data source
tb \
    --gcp-project-id projectid \
    --gcs-bucket tinybird \
    datasource replace test__bigquery_datasource \
    --connector bigquery \
    --sql "select date, recipient_country, recipient_code, id from \`projectid.dataset.table\`"

You can also replace the data partially. For instance, if you want to replace the last month of data you can run this:

Extract from BigQuery and replace the data in a data source
NOW=`date +"%Y-%m-%d"`
ONE_MONTH=`date --date="${NOW} -1 month" +%Y-%m-%d`

tb \
    --gcp-project-id projectid \
    --gcs-bucket tinybird \
    datasource replace test__bigquery_datasource \
    --connector bigquery \
    --sql "select date, recipient_country, recipient_code, id from \`projectid.dataset.table\` WHERE date BETWEEN TO_DATE('${ONE_MONTH}','YYYY-MM-DD') AND TO_DATE('${NOW}','YYYY-MM-DD')" \
    --sql-condition "date BETWEEN toDate('${ONE_MONTH}') AND toDate('${NOW}')