Tinybird CLI

The Tinybird Analytics command-line tool allows you to use all the Tinybird functionality directly from the command line. Additionally, it includes several functions to create and manage data projects easily.

How to install

You need to have installed Python 3 and pip:

Supported Python versions: 3.6, 3.7, 3.8

Example: creating a virtual environment for Python 3
virtualenv -p python3 .e
. .e/bin/activate

Alternatively you can use venv

Example: creating a virtual environment for Python 3
python3 -mvenv .e
. .e/bin/activate

If you are not used to python virtual environments you can read this guide about virtualenv or this other one about venv.

Install tinybird-cli
pip install tinybird-cli

The first step is to check everything works correctly and that you’re able to authenticate:

Authenticate
tb auth
Copy the admin token from https://ui.tinybird.co/tokens and paste it here: <pasted token>
** Auth successful!
** Configuration written to .tinyb file, consider adding it to .gitignore

It’ll ask for your admin token, you have to copy it from the tokens page and paste it. It saves your credentials in the .tinyb file in your current directory. Please, add it to .gitignore (or the ignore list in the SCM you use) because it contains Tinybird credentials.

Quick intro

Create a new project

Initialize
tb init

Generate a data source file (we will explain this later) based on a sample CSV file and add a few lines

Generate data source
$ tb datasource generate /tmp/sample.csv
** Generated datasources/sample.datasource
**   => Run `tb push datasources/sample.datasource` to create it on the server
**   => Add data with `tb datasource append sample /tmp/sample.csv`
**   => Generated fixture datasources/fixtures/sample.csv

Push it to Tinybird

Push data source
$ tb push datasources/sample.datasource
** Processing datasources/sample.datasource
** Building dependencies
** Creating sample
** not pushing fixtures

Append some data

Append data
$ tb datasource append sample datasources/fixtures/sample.csv
🥚 starting import process
🐥 done

Query the data

Query the data
$ tb sql "select count() from sample"

Query took 0.000475 seconds, read 1 rows // 4.1 KB

-----------
| count() |
-----------
|     384 |
-----------

Check the data source is in the data sources list

List data sources
$ tb datasource ls

    name                    row_count    size         created at                  updated at
-------------------------  -----------  -----------  --------------------------  --------------------------
sample                             384     20k       2020-06-24 15:09:00.409266  2020-06-24 15:09:00.409266
madrid_traffic                87123456     1.5Gb     2019-07-02 10:40:03.840151  2019-07-02 10:40:03.840152
...

Go to your Tinybird dashboard to check the datasource is present there

Data projects

A data project is a set of files that describes how your data should be stored, processed, and exposed through APIs.

The same way we maintain source code files in a repository, use a CI, make deployments, run tests, etc, Tinybird provides a set of tools to work following a similar pattern but with data pipelines. In other words: the source code in your project would be the data files in Tinybird.

Following this approach, any data project can be managed with a list of text-based files that allow you to:

  • Define how the data should flow, from the start (the schemas) to the end (the API)

  • Manage your data files under version control

  • Use branches in your data files

  • Run tests

  • Deploy a data project like you’d deploy any other software application

Let’s see an example. Imagine an e-commerce site where we have events from users and a list of products with their attributes. Our purpose is to expose several API endpoints to return sales per day and top product per day.

The data project would look like this:

ecommerce_data_project/
    datasources/
        events.datasource
        products.datasource
        fixtures/
            events.csv
            products.csv
    pipes/
        top_product_per_day.pipe

    endpoints/
        sales.pipe
        top_products.pipe

Every file in this folder maps to a data source or a pipe in Tinybird. You can create a project from scratch with tb init, but in this case let’s assume it’s already created and stored in a GitHub repository

Uploading the project

Clone demo
git clone https://github.com/tinybirdco/ecommerce_data_project.git
cd ecommerce_data_project

Refer to the how to install section to connect the ecommerce_data_project with your Tinybird account.

You can push the whole project to your Tinybird account to check everything is fine. The tb push command uploads the data to Tinybird, but previously it checks the project dependencies and the SQL syntax, between others. In this case, we use the --push-deps flag to push everything

Push dependencies
$ tb push --push-deps
** Processing ./datasources/events.datasource
** Processing ./datasources/products.datasource
** Processing ./pipes/top_product_per_day.pipe
** Processing ./endpoints/top_products_params.pipe
** Processing ./endpoints/sales.pipe
** Processing ./endpoints/top_products.pipe
** Building dependencies
** Creating products
** Creating events
** Creating products_join_by_id
** Creating top_product_per_day
** Creating sales
** => Test endpoint at https://api.tinybird.co/v0/pipes/sales.json
** Creating products_join_by_id_pipe
** Creating top_products_params
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products_params.json
** Creating top_products
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products.json
** not pushing fixtures

Once it finishes, the endpoints defined in our project (sales and top_products) will be available and we can start pushing data to the different data sources. The project is ready.

Now, let’s go through the different files in the project in order to understand how to deal with them individually.

Define data sources

Data sources define how your data is going to be stored. You can add data to these data sources using Data Sources API.

Each data source is defined by a schema and other properties we will explain later (more on this in the Datafile reference)

Let’s see event.datasource:

DESCRIPTION >
    # Events from users
    This contains all the events produced by Kafka, there are 4 fixed columns.
    plus a `json` column which contains the rest of the data for that event.
    See [documentation](url_for_docs) for the different events.

SCHEMA >
    timestamp DateTime,
    product String,
    user_id String,
    action String
    json String

ENGINE MergeTree
SORTING_KEY timestamp

As we can see, there are three main sections:

  • A general description (using markdown in this case),

  • The schema

  • How the data is sorted. In this case, the access pattern is most of the time by the timestamp column. If no SORTING_KEY is set, Tinybird picks one by default, date or datetime columns most of the time.

Now, let’s push the data source:

Push the events data source
$ tb push datasources/events.datasource
** Processing datasources/events.datasource
** Building dependencies
** Creating events
** not pushing fixtures

You cannot override data sources, if you try to push a data source that already exists in your account you’ll get an output like this: events already exists, skipping. If you actually need to override the data source you can first remove it or just upload a new version.

Define data pipes

You usually don’t use the data as it comes in. For example, in this project, we are dealing with Kafka events so we could be using the events data source but generating a live materialized view of that table is better.

For this purpose, we have pipes. Let’s see how to create a data pipe that transforms the data as it’s inserted. This is the content of pipes/top_product_per_day.pipe

NODE only_buy_events
DESCRIPTION >
    filters all the buy events

SQL >
    SELECT
        toDate(timestamp) date,
        product,
        JSONExtractFloat(json, 'price') AS price
    FROM events
    WHERE action = 'buy'


NODE top_per_day
SQL >
   SELECT date,
          topKState(10)(product) top_10,
          sumState(price) total_sales
    FROM only_buy_events
    GROUP BY date

TYPE materialized
ENGINE AggregatingMergeTree
SORTING_KEY date

Each pipe can have one or more nodes. In this pipe, as we can see, we’re defining two nodes: only_buy_events and top_per_day.

  • The first one filters “buy” events and extracts some data from the json column.

  • The second one runs the aggregation.

The pattern to define a pipeline is simple: use NODE to start a new node and then use SQL > to define the SQL for that node. Notice you can use other nodes inside the SQL. In this case, the second node uses the first one only_buy_events.

Pushing a pipe is the same as pushing a data source:

Populate
$ tb push pipes/top_product_per_day.pipe --populate
** Processing pipes/top_product_per_day.pipe
** Building dependencies
** Creating top_product_per_day
** Populate job url https://api.tinybird.co/v0/jobs/c7819921-aca0-4424-98c5-9223ca2475c3
** not pushing fixtures

In this case, it’s a materialized node. If you want to populate with the existing data in events table you can use --populate flag

When using the --populate flag you get a job URL. Data population is done in backgroundo, so you can check the status of the job by checking the URL provided.

Define endpoints

Endpoints are the way you expose the data to be consumed. They look pretty similar to pipes and, well, they are actually pipes that transform the data but add an extra step that exposes the data.

Let’s look into endpoints/top_products.pipe

NODE endpoint
DESCRIPTION >
    returns top 10 products for the last week
SQL >
    SELECT
        date,
        topKMerge(10)(top_10) AS top_10
    FROM top_per_day
    WHERE date > today() - interval 7 day
    GROUP BY date

The syntax is exactly the same we’re using in the data transformation pipes, but now, the results can be accessed through the endpoint https://api.tinybird.co/v0/top_products.json?token=TOKEN

When you push an endpoint a TOKEN with PIPE:READ permissions is automatically created. You can see it from the tokens UI or directly from the CLI with the command tb pipe token_read <endpoint_name>

Let’s push it now:

Push the top products pipe
$ tb push endpoints/top_products.pipe
** Processing endpoints/top_products.pipe
** Building dependencies
** Creating top_products
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products.json
** not pushing fixtures

It’s possible to add parameters to any endpoint. For example, let’s parametrize the dates to be able to filter the data between two dates:

NODE endpoint
DESCRIPTION >
    returns top 10 products for the last week
SQL >
    %
    SELECT
        date,
        topKMerge(10)(top_10) AS top_10
    FROM top_per_day
    WHERE date between {{Date(start)}} AND {{Date(end)}}
    GRUP BY date

Now, the endpoint can receive start and end parameters: https://api.tinybird.co/v0/top_products.json?start=2018-09-07&end=2018-09-17&token=TOKEN

The supported types for the parameters are: Boolean, DateTime, Date, Float32, Float64, Int, Integer, Int8, Int16, UInt8, UInt16, UInt32, Int32, Int64, UInt64, Symbol, String

Note that for the parameters templating to work you need to start your NODE SQL definition by the character %

Overriding an endpoint or a data pipe

When working on a project, you usually need to push several versions of the same file. You can override a pipe that has already been pushed using the --force flag.

Override the pipe
$ tb push endpoints/top_products_params.pipe --force

** Processing endpoints/top_products_params.pipe
** building dependencies
** Creating op_products_params
current https://api.tinybird.co/v0/pipes/top_products_params.json?start=2020-01-01&end=2010-01-01
    new https://api.tinybird.co/v0/pipes/top_products_params__checker.json?start=2020-01-01&end=2010-01-01 ... ok
current https://api.tinybird.co/v0/pipes/top_products_params.json?start=2010-01-01&end=2021-01-01
    new https://api.tinybird.co/v0/pipes/top_products_params__checker.json?start=2010-01-01&end=2021-01-01 ... ok
**    => Test endpoint at https://api.tinybird.co/v0/pipes/op_products_params.json

It will override the endpoint. If the endpoint has been called before, it runs regression tests with the most frequent requests. If the new version doesn’t return the same data, then it’s not pushed. You can see in the example how to run all the requests tested (up to 10).

However, it’s possible to force the push without running the checks using the --no-check flag:

Force override
$ tb push endpoints/top_products_params.pipe --force --no-check
** Processing endpoints/top_products_params.pipe
** Building dependencies
** Creating top_products_params
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products_params.json

This is a security check to avoid breaking production environments. It’s better to add an extra parameter than to be sorry

Test different environments or branches of the project

It’s common to work with data projects under version control. And of course, you work with branches in the same way you do with your software application code. Therefore, what happens if you want to test a branch with some new features? With Tinybird you can create a new branch of your project with the --prefix flag

Use a prefix
$ tb push --push-deps --prefix my_new_feature

It pushes the same files using the same project structure and nomenclature but adding the my_new_feature prefix to every data source, pipe, and endpoint.

One nice usage of the --prefix flag is the ability to maintain staging and production environments for your data project.

Downloading data files from Tinybird

Sometimes you use the user interface to create pipes, and then you want to store them in your data project. It’s possible to download data files using the pull command:

Pull a specific file
$ tb pull --match endpoint_im_working_on

It will download the endpoint_im_working_on.pipe directly to the current folder.

You can also specify a prefixed version of a pipe or data source to pull. Let’s say you have staging and production version of the pipes/top_product_per_day.pipe. You created them like this:

Staging and production versions
$ tb push pipes/top_product_per_day.pipe --prefix stg
$ tb push pipes/top_product_per_day.pipe --prefix pro

So in your account you have these two versions:

Staging and production versions
$ tb pipe ls
---------------------------------------------------------------------------------------------------
| prefix      | version | name                                      | published date      | nodes |
---------------------------------------------------------------------------------------------------
|             |         | getting_started_pipe                      | 2020-07-06 15:42:50 |    11 |
| stg         |         | top_product_per_day                       | 2020-10-02 11:34:33 |     2 |
| pro         |         | top_product_per_day                       | 2020-10-02 11:34:33 |     2 |
---------------------------------------------------------------------------------------------------

If you want to sync your local copy of the staging version, you can run this command:

Pull with prefix
$ tb pull --match top_product_per_day --prefix stg

Working with versions

Data sources, endpoints, and pipes change over time. Versions are a good way to organize these changes.

The version system is simple:

  • Each resource might have a version. It’s specified with a VERSION <number> in the project file.

  • When a resource is pushed, it uses the version of the dependencies found in local files. For example, if a pipe uses a data source and both files have VERSION 1 locally when you push the pipe, it will use version 1 of the data source even if the server has other versions.

You can check which version is set for each resource with tb datasource ls or tb pipe ls commands.

An example of a data source with a defined version:

# this data source is in version 3
VERSION 3
DESCRIPTION generated from /Users/matias_el_humilde/tmp/sample.csv

SCHEMA >
    `d` DateTime,
    `total` Int32,
    `from_novoa` Int16

Versions are optional, there could be resources without any version, but we encourage you to use them for all the resources even if you just need to version of some of them

Versions start to payoff when you put your data sources and endpoints in a production environment (that is, they are integrated into an application or other workflow), then you want to keep working on your endpoints without disrupt the applications that use them, so you create a new version until it’s ready to be published.

Naming conventions

The VERSION system and the --prefix flag both use this convention to rename your pipes and datasources:

{prefix}__{datasource|pipe}__{version}

This is important to note because for certain operations such as running a SQL or removing a datasource you need to provide the full name (including the prefix and version).

For instance, if you created the version 0 of a datasource in stg like this:

Create datasource with version and prefix
$ tb push datasources/event.datasource --prefix stg

When you want to remove it you do it like this:

Remove datasource with version and prefix
$ tb datasource rm stg__event__v0

Datafile reference

Syntax

Data files follow a really simple syntax:

Basic syntax
CMD value
# this is a comment
OTHER_CMD "value with multiple words"

or

Multiline syntax
CMD >
    multi
    line
    values
    are indented

A simple example:

Schema syntax
DESCRIPTION generated from /Users/matias_el_humilde/tmp/sample.csv

SCHEMA >
    `d` DateTime,
    `total` Int32,
    `from_novoa` Int16

Datafile commands

Common:

  • VERSION <integer_number> - Defines the version for the resources

Data Sources:

  • SCHEMA <schema definition> - Defines a schema, only valid for .datasource files

Pipes:

  • NODE <node_name> - Starts the definition of a new node, all the commands until a new NODE command or the end of the file will be related to the this node

  • SQL <sql> - Defines the SQL for a node

  • DESCRIPTION <markdown_string> - Sets the description for a node or the complete file

  • INCLUDE <include_path.incl> <variables> - Includes are pieces of a pipe that can be reused in multiple pipe datafies.

  • TYPE <pipe_type> - Sets the type of the node. By default it’s ‘standard’, can be set to ‘materialized’

  • DATASOURCE <data_source_name> - Sets the destination data source for materialized nodes

Integrated help

Once you’ve installed the CLI you can access the integrated help:

Integrated help
$ tb --help
Usage: tb [OPTIONS] COMMAND [ARGS]...

Options:
--debug / --no-debug  Print internal representation
--token TEXT          Set auth token
--host TEXT           Set custom host if it's different than
                        https://api.tinybird.co
--version             Show the version and exit.
--help                Show this message and exit.

Commands:
auth          Configure auth
check         Check file syntax
datasource    Data sources commands
dependencies  Print all data sources dependencies
drop-prefix   drop all the resources inside a project with prefix This...
init          Initialize folder layout
pipe          Pipes commands
pull          Retrieve latest versions for project files from Tinybird
push          Push files to Tinybird
sql           Run SQL query over data sources and pipes

And you can do the same for every available command, so you don’t need to know every detail for every command:

Integrated command help
$ tb datasource --help
Usage: tb datasource [OPTIONS] COMMAND [ARGS]...

Data sources commands

Options:
--help  Show this message and exit.

Commands:
analyze   Analyze a URL before creating a new data source
append    Create a data source from a URL or local file
generate  Generates a data source file based on a sample CSV file from local
            disk or url
ls        List data sources
rm        Delete a data source

Full Command list

auth

Configure auth

check

Check file syntax

datasource

Data sources commands

datasource analyze

Analyze a URL before creating a new data source

datasource append

Create a data source from a URL

datasource generate

Generates a data source file based on a sample CSV file from local disk or url

datasource ls

List data sources

datasource rm

Delete a data source

dependencies

Print all data sources dependencies

drop-prefix

Drops all the resources inside a project with prefix. This command is

dangerous because it removes everything, use with care

init

Initializes folder layout

pipe

Pipes commands

pipe append

Append a node to a pipe

pipe data

Print data returned by a pipe

pipe generate

Generates a pipe file based on a sql query

pipe ls

List pipes

pipe new

Create a new pipe

pipe rm

Delete a pipe

pipe set_endpoint

Change the published node of a pipe

pipe token_read

Retrieve a token to read a pipe

pull

Retrieve latest versions for project files from Tinybird

push

Push files to Tinybird

sql

Run SQL query over data sources and pipes

Supported plaforms

It supports Linux and OSX > 10.14