Common Use Cases¶
This document shows some common use cases where the Commmand Line Interface (CLI) can help you on your day to day workflow.
Data projects and source control repositories¶
The Command Line Tool (CLI) works better when your data project is connected to a source control repository (such as git).
That way all your data source schemas, exploration pipes, transformations and endpoints can be managed as source code, making it possible for several developers to collaborate on the same data project, trace endpoints to changes through the commit history or managing multiple environments and versions of your endpoints seamlessly.
When asking for support, the Tinybird team would be able to work with you on the same code base, if that’s what you want.
Some of the next use cases assume your data project is connected to a source control repository (SCR). Take into account that’s something you have to manage yourself. The CLI just manages your data files and the connection to your Tinybird account, but you have to push code to the SCR separately.
When to use versions and prefixes¶
Specially when any of your endpoints is in production, increase the
version number when:
You change the schema of a data source
You change the output of an endpoint
See the section working with versions for more information.
That way you can keep both the old and new version working. You can use the
--force flag to overwrite pipes with the same version. Nevertheless, you cannot overwrite data sources.
For the case of data sources, if you need to change the schema (i.e. adding new columns), you have to version it, otherwise you’ll have to delete / recreate it which will force you to delete any endpoint using it as well.
Maintain different environments for your data sources and pipes (staging, development, production)
Work on feature branches or test different approaches for any of your pipes.
Downloading all the pipes and data sources from your account¶
There are two ways you can start working with the CLI, either you start a new data project from scratch or if you already have some data and endpoints in your Tinybird account, you pull it to your local disk to continue working from there.
For this second option, use the
--match flag to filter pipes or data sources containing the string passed as parameter, and the
--prefix flag to only pull prefixed files.
For instance, to pull all the files named
project with the prefix
pro, you can do:
tb pull --match project --prefix pro [D] writing pro__project.datasource(demo) [D] writing pro__project_geoindex.datasource(demo) [D] writing pro__project_geoindex_pipe.pipe(demo) [D] writing pro__project_agg.pipe(demo) [D] writing pro__project_agg_API_endpoint_request_log_pipe_3379.pipe(demo) [D] writing pro__project_exploration.pipe(demo) [D] writing pro__project_moving_avg.pipe(demo)
Once the files are pulled you can
push the changes to your source control repository and continue working from the command line.
When you pull data sources or pipes, your data is not downloaded, just the data source schemas and pipes definition, so they can be replicated easily.
The pull command does not preserve the directory structure, so all your data files will be downloaded to your current directory.
Pushing a development branch of your pipes¶
Most of the times when you are developing or fixing an analysis, you want to create a new development branch in your source control repository and push your data sources and pipes prefixed to your Tinybird account, so they don’t collide with your production ones, you can do it like this:
tb push datasources/sample.datasource --prefix dev
prefix is prepended to the data source or pipe name, so if you have a data source called
sample when you push a
--prefix dev of it, the new version we’ll be pushed as
Dropping a development version of your files¶
Once you’ve finished fixing, what you usually do is merging the development branch in your source control repository, pushing your files to
pro tagged with a new version (or override the current versions by
--force pushing them) and drop the development files. To do this last step, use the
tb drop-prefix dev
--dry-run flag to check what files will be dropped:
tb drop-prefix dev --dry-run [DRY-RUN] Removing data source dev__project [DRY-RUN] Removing data source dev__project_geoindex [DRY-RUN] Removing pipe dev__uc000_agg [DRY-RUN] Removing pipe dev__project_geoindex_pipe [DRY-RUN] Removing pipe dev__uc000_moving_avg [DRY-RUN] Removing pipe dev__uc000_exploration [DRY-RUN] Removing pipe dev__uc000_agg_API_endpoint_request_log_pipe_3379
Pushing the whole data project¶
If you want to push the whole project you can run:
tb push --push-deps
You can use a prefix to have a separate development version of the whole project, ingest fixtures, etc.
tb push --push-deps --prefix dev --fixtures
Pushing a pipe with all its dependencies¶
tb push pipes/mypipe.pipe --push-deps
Adding a new column to a data source¶
Data Source schemas are mostly immutable, but you have the possibility to append new columns at the end of an existing Data Source with an Engine from the MergeTree Family. If you want to change columns, add columns in other positions or modify the engine, you need to first create a new version of the Data Source with the modified schema, then ingest the data, and finally point the pipes to this new endpoint. To force a pipe replacement use the
--force flag when pushing it.
Append new columns to an existing Data Source¶
Let’s suppose that you have a Data Source defined like this and that this Data Source has been already pushed to Tinybird:
VERSION 1 SCHEMA > `test` Int16, `local_date` Date, `test3` Int64
If you want to append a new column, you will need to change the
*.datasource file with the new column
new_column. You can append as many columns as you need at the same time:
VERSION 1 SCHEMA > `test` Int16, `local_date` Date, `test3` Int64, `new_column` Int64
Remember that the only kind of alter column operation supported is appending new columns to an existing Data Source and that the engine of that Data Source must be of the MergeTree family.
After that, you will need to execute
tb push my_datasource.datasource --force and confirm the addition of the column(s). The
--force parameter is required for this kind of operation.
Existing imports will continue working once the new columns are added, even if those imports don’t carry values for the added columns. In those cases, the new columns will just contain empty values like
0 for numeric values or
'' for Strings, or if defined, the default values in the schema.
Existing imports will continue to work even though they will not be carrying values for the new columns. In those cases, new columns will default to the values defined in the schema if any, or to the default column’s empty value, like 0 for numeric fields and ‘’ for Strings.
Create a new version of the Data Source to make the rest of the add/change columns operations that you may need¶
To create a new version of a Data Source, just increase the
VERSION number inside the Data Source file, taking into account
VERSION should be an integer number.
VERSION 1 SCHEMA > `test` Int16, `local_date` Date, `test3` Int64, `new_column` Int64
In this case we are creating a
VERSION 1 of our test.datasource adding a
Once you have increased the
VERSION number, you just have to push the Data Source, ingest new data and
--force push any dependent pipe where you want to use the new Data Source version.
The version number is appended to the Data Source or pipe name, so if you have a pipe called
this_is_my_pipe when you push a
VERSION 1 of it, the new version we’ll be pushed as
How to create materialized views¶
Materialized views allow to transform the data from an
origin data source to a
destination data source. There are several use cases where materialized views are really handy and can make a difference in the response times of your analyses, to name a few:
Denormalize several normalized tables into one via a
Transform data using an optimized
ENGINEfor a concrete analysis
Transform your source data on the fly as you ingest data in your origin data source
One important thing to know is that materialized views are live views of your origin data source. Any time you
replace data to your origin data source, all the destination data sources crated as materialized views are properly synced. It means you don’t have to worry about costly re-sync processes.
Let’s say you have an
origin data source (
my_origin.datasource) like this one:
VERSION 0 SCHEMA > `id` Int16, `local_date` Date, `name` String, `count` Int64
And you need an optimized version of this data source that pre-aggregates the
count for each ID. You should create a new data source that uses a
SimpleAggregateFunction, which will be a materialized view.
First define the
destination data source (
VERSION 0 SCHEMA > `id` Int16, `local_date` Date, `name` String, `total_count` SimpleAggregateFunction(sum, UInt64) ENGINE "AggregatingMergeTree" PARTITION_KEY "toYYYYMM(local_date)" SORTING_KEY "local_date,id"
And then you’ll write a transformation pipe (
my_transformation.pipe) like this:
VERSION 0 NODE transformation_node SQL > SELECT id, local_date, name, sum(count) as total_count FROM my_origin TYPE materialized DATASOURCE my_destination
Once you have the origin and destination data sources defined and the transformation pipe you can push them:
tb push my_origin.datasource tb push my_destination.datasource tb push my_transformation.pipe --populate
Any time you ingest data into
my_origin, the data in
my_destination will be automatically updated.
Using materialized columns¶
Another useful tool to transform data on the fly are materialized columns. They allow you to make transformations on the columns of a data source by defining the transformation directly in the schema of the data source file.
Let’s say you want to parse a date column which may have a wrong format in your source CSV file, you’ll do it like this:
VERSION 0 SCHEMA > `id` Int16, `local_date` String, `parsed_local_date` DateTime MATERIALIZED parseDateTimeBestEffort(local_date)
In this case, we’ve defined
local_date as String since it may not be correctly parsed, but we create a
MATERIALIZED column, called
parsed_local_date using a built-in function called
parseDateTimeBestEffort to convert
local_date into a
You can use any of the ClickHouse built-in functions for new materialized columns and the transformation will be done at ingestion time.
Take into account that,
MATERIALIZED columns are not shown in the data source view in the UI, if you want to check their values, you have to create a pipe and run a query over the data source selecting the materialized columns.
How to force populate materialized views¶
Sometimes you want to force populating a materialized view, most likely because you changed the transformation in the pipe and you want the data from the origin data source to be re-ingested.
tb push my_may_view_pipe.pipe --force --populate
You’ll get as a response a Jobs API
job_url so you can check its progress and status.
Specially when you work with pipes that make use of several versions of different data sources, you might need to double check which version of which data source the pipe is pointing at before you push it to your Tinybird account.
To do so, use the
--dry-run --debug flags like this:
tb push my_pipe.pipe --dry-run --debug
Once you’ve validated the content of the pipe, you can just push your pipe normally.
Automatic regression tests for your API endpoints¶
Any time you
--force push a pipe which has a public endpoint that has received requests, some automatic regression tests are executed.
What the CLI does is checking for the top ten requests, if the previous version of the endpoint returns the same data as the version you are pushing. This can help you to validate if you are introducing a regression in your API.
Other times, you are consciously
--force pushing a new version which returns different data, in that case you can avoid the regression tests with the
tb push my_may_view_pipe.pipe --force --no-check