🚀Introducing Versions: Develop data products using Git.Join the waitlist

Data Sources API

The Data Sources API allows you to list, create, update or delete your Tinybird Data Sources. You can also insert or delete data from Data Sources.

New to Data Sources? Read more about them here

All endpoints require authentication using an Auth Token with the appropriate scope.

POST /v0/datasources/?

This endpoint supports 3 modes to enable 3 distinct operations, depending on the parameters provided:

  • Create a new Data Source with a schema

  • Append data to an existing Data Source

  • Replace data in an existing Data Source

The mode is controlled by setting the mode parameter, for example, -d "mode=create".

When importing remote files by URL, if the server hosting the remote file supports HTTP Range headers, the import process will be parallelized.

Request parameters

KEY

TYPE

DESCRIPTION

mode

String

Default: create. Other modes: append and replace.
The create mode creates a new Data Source and attempts to import the data of the CSV if a URL is provided or the body contains any data.
The append mode inserts the new rows provided into an existing Data Source (it will also create it if it does not exist yet).
The replace mode will remove the previous Data Source and its data and replace it with the new one; Pipes or queries pointing to this Data Source will immediately start returning data from the new one and without disruption once the replace operation is complete.

The create mode will automatically name the Data Source if no name parameter is provided; for the append and replace modes to work, the name parameter must be provided and the schema must be compatible.

name

String

Optional. Name of the Data Source to create, append or replace data. This parameter is mandatory when using the append or replace modes.

url

String

Optional. The URL of the CSV with the data to be imported

dialect_delimiter

String

Optional. The one-character string separating the fields. We try to guess the delimiter based on the CSV contents using some statistics, but sometimes we fail to identify the correct one. If you know your CSV’s field delimiter, you can use this parameter to explicitly define it.

dialect_new_line

String

Optional. The one- or two-character string separating the records. We try to guess the delimiter based on the CSV contents using some statistics, but sometimes we fail to identify the correct one. If you know your CSV’s record delimiter, you can use this parameter to explicitly define it.

dialect_escapechar

String

Optional. The escapechar removes any special meaning from the following character. This is useful if the CSV does not use double quotes to encapsulate a column but uses double quotes in the content of a column and it is escaped with, e.g. a backslash.

schema

String

Optional. Data Source schema in the format ‘column_name Type, column_name_2 Type2…’. When creating a Data Source with format ndjson the schema must include the jsonpath for each column, see the JSONPaths section for more details.

engine

String

Optional. Engine for the underlying data. Requires the schema parameter.

engine_*

String

Optional. Engine parameters and options, check the Engines section for more details

progress

String

Default: false. When using true and sending the data in the request body, Tinybird will return block status while loading using Line-delimited JSON.

token

String

Auth token with create or append permissions. Required only if no Bearer Authorization header is found

type_guessing

String

Default: true The type_guessing parameter is not taken into account when replacing or appending data to an existing Data Source. When using false all columns are created as String otherwise it tries to guess the column types based on the CSV contents. Sometimes you are not familiar with the data and the first step is to get familiar with it: by disabling the type guessing, we enable you to quickly import everything as strings that you can explore with SQL and cast to the right type or shape in whatever way you see fit via a Pipe.

debug

String

Optional. Enables returning debug information from logs. It can include blocks, block_log and/or hook_log

replace_condition

String

Optional. When used in combination with the replace mode it allows you to replace a portion of your Data Source that matches the replace_condition SQL statement with the contents of the url or query passed as a parameter. See this guide to learn more.

format

String

Default: csv. Indicates the format of the data to be ingested in the Data Source. By default is csv and you should specify format=ndjson for NDJSON format, and format=parquet for Parquet files.

Examples

Creating a CSV Data Source from a schema
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "schema=symbol String, date Date, close Float32"
Creating a CSV Data Source from a local CSV file with schema inference
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources?name=stocks" \
-F csv=@local_file.csv
Creating a CSV Data Source from a remote CSV file with schema inference
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d url='https://.../data.csv'
Creating an empty Data Source with a ReplacingMergeTree engine and custom engine settings
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "schema=pk UInt64, insert_date Date, close Float32" \
-d "engine=ReplacingMergeTree" \
-d "engine_sorting_key=pk" \
-d "engine_ver=insert_date" \
-d "name=test123" \
-d "engine_settings=index_granularity=2048, ttl_only_drop_parts=false"
Appending data to a Data Source from a local CSV file
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources?name=data_source_name&mode=append" \
-F csv=@local_file.csv
Appending data to a Data Source from a remote CSV file
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d mode='append' \
-d name='data_source_name' \
-d url='https://.../data.csv'
Replacing data with a local file
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources?name=data_source_name&mode=replace" \
-F csv=@local_file.csv
Replacing data with a remote file from a URL
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d mode='replace' \
-d name='data_source_name' \
--data-urlencode "url=http://example.com/file.csv"
GET /v0/datasources/?
getting a list of your Data Sources
curl \
-H "Authorization: Bearer <DATASOURCES:READ token>" \
-X GET "https://api.tinybird.co/v0/datasources"

Get a list of the Data Sources in your account.

The token you use to query the available Data Sources will determine what Data Sources get returned: only those accessible with the token you are using will be returned in the response.

Successful response
{
    "datasources": [{
        "id": "t_a049eb516ef743d5ba3bbe5e5749433a",
        "name": "your_datasource_name",
        "cluster": "tinybird",
        "tags": {},
        "created_at": "2019-11-13 13:53:05.340975",
        "updated_at": "2022-02-11 13:11:19.464343",
        "replicated": true,
        "version": 0,
        "project": null,
        "headers": {},
        "shared_with": [
            "89496c21-2bfe-4775-a6e8-97f1909c8fff"
        ],
        "engine": {
            "engine": "MergeTree",
            "engine_sorting_key": "example_column_1",
            "engine_partition_key": "",
            "engine_primary_key": "example_column_1"
        },
        "description": "",
        "used_by": [],
        "type": "csv",
        "columns": [{
                "name": "example_column_1",
                "type": "Date",
                "codec": null,
                "default_value": null,
                "jsonpath": null,
                "nullable": false,
                "normalized_name": "example_column_1"
            },
            {
                "name": "example_column_2",
                "type": "String",
                "codec": null,
                "default_value": null,
                "jsonpath": null,
                "nullable": false,
                "normalized_name": "example_column_2"
            }
        ],
        "statistics": {
            "bytes": 77822,
            "row_count": 226188
        },
        "new_columns_detected": {},
        "quarantine_rows": 0
    }]
}
Request parameters

Key

Type

Description

attrs

String

comma separated list of the Data Source attributes to return in the response. Example: attrs=name,id,engine. Leave empty to return a full response

Note that the statistics’s bytes and row_count attributes might be null depending on how the Data Source was created.

POST /v0/datasources/(.+)/alter

Modify the Data Source schema.

This endpoint supports the operation to alter the following fields of a Data Source:

Request parameters

Key

Type

Description

schema

String

Optional. Set the whole schema that adds new columns to the existing ones of a Data Source.

description

String

Optional. Sets the description of the Data Source.

kafka_store_raw_value

Boolean

Optional. Default: false. When set to true, the ‘value’ column of a Kafka Data Source will save the JSON as a raw string.

kafka_store_headers

Boolean

Optional. Default: false. When set to true, the ‘headers’ of a Kafka Data Source will save the JSON.

ttl

String

Optional. Set to any value accepted in ClickHouse for a TTL or to ‘false’ to remove the TTL.

dry

Boolean

Optional. Default: false. Set to true to show what would be modified in the Data Source, without running any modification at all.

The schema parameter can be used to add new columns at the end of the existing ones in a Data Source.

Be aware that currently we don’t validate if the change will affect the existing MVs (Materialized Views) attached to the Data Source to be modified, so this change may break existing MVs. For example, avoid changing a Data Source that has a MV created with something like SELECT * FROM Data Source .... If you want to have forward compatible MVs with column additions, create them especifying the columns instead of using the * operator.

Also, take in account that, for now, the only engines supporting adding new columns are those inside the MergeTree family.

To add a column to a Data Source, call this endpoint with the Data Source name and the new schema definition.

For example, having a Data Source created like this:

Creating a Data Source from a schema
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "schema=symbol String, date Date, close Float32"

if you want to add a new column ‘concept String’, you need to call this endpoint with the new schema:

Adding a new column to an existing Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \
-d "schema=symbol String, date Date, close Float32, concept String"

If everything went ok, you will get the operations done in the response:

ADD COLUMN operation resulted from the schema change.
{
    "operations": [
        "ADD COLUMN `concept` String"
    ]
}

You can also view the inferred operations without executing them adding dry=true in the parameters.

  • To modify the description of a Data Source:

Modifying the description a Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \
-d "name=stocks" \
-d "description=My new description"
  • To save in the “value” column of a Kafka Data Source the JSON as a raw string:

Saving the raw string in the value column of a Kafka Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \
-d "name=stocks" \
-d "kafka_store_raw_value=true"
-d "kafka_store_headers=true"
  • To modify the TTL of a Data Source:

Modifying the TTL of a Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \
-d "name=stocks" \
-d "ttl=12 hours"
  • To remove the TTL of a Data Source:

Modifying the TTL of a Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "ttl=false"

You can also alter the JSONPaths of existing Data Sources. In that case you have to specify the JSONPath in the schema in the same way as when you created the Data Source.

POST /v0/datasources/(.+)/truncate

Truncates a Data Source in your account. If the Data Source has dependent Materialized Views, those won’t be truncated in cascade. In case you want to delete data from other dependent Materialized Views, you’ll have to do a subsequent call to this method. Auth token in use must have the DATASOURCES:CREATE scope.

Truncating a Data Source
curl \
    -H "Authorization: Bearer <DATASOURCES:CREATE token>" \
    -X POST "https://api.tinybird.co/v0/datasources/name/truncate"

This works as well for the quarantine table of a Data Source. Remember that the quarantine table for a Data Source has the same name but with the “_quarantine” suffix.

Truncating the quarantine table from a Data Source
curl \
    -H "Authorization: Bearer <DATASOURCES:DROP token>" \
    -X POST "https://api.tinybird.co/v0/datasources/:name_quarantine/truncate"
POST /v0/datasources/(.+)/delete

Deletes rows from a Data Source in your account given a SQL condition. Auth token in use must have the DATASOURCES:CREATE scope.

Deleting rows from a Data Source given a SQL condition
curl \
    -H "Authorization: Bearer <DATASOURCES:CREATE token>" \
    --data "delete_condition=(country='ES')" \
    "https://api.tinybird.co/v0/datasources/:name/delete"

When deleting rows from a Data Source, the response will not be the final result of the deletion but a Job. You can check the job status and progress using the Jobs API. In the response, id, job_id, and delete_id should have the same value:

Delete API Response
{
    "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b",
    "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b",
    "job_url": "https://api.tinybird.co/v0/jobs/64e5f541-xxxx-xxxx-xxxx-00524051861b",
    "job": {
        "kind": "delete_data",
        "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b",
        "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b",
        "status": "waiting",
        "created_at": "2023-04-11 13:52:32.423207",
        "updated_at": "2023-04-11 13:52:32.423213",
        "started_at": null,
        "is_cancellable": true,
        "datasource": {
            "id": "t_c45d5ae6781b41278fcee365f5bxxxxx",
            "name": "shopping_data"
        },
        "delete_condition": "event = 'search'"
    },
    "status": "waiting",
    "delete_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b"
}

To check on the progress of the delete job, use the job_id from the Delete API response to query the Jobs API.

For example, to check on the status of the above delete job:

checking the status of the delete job
curl \
    -H "Authorization: Bearer <TOKEN>" \
    https://api.tinybird.co/v0/jobs/64e5f541-xxxx-xxxx-xxxx-00524051861b

Would respond with:

Job API Response
{
    "kind": "delete_data",
    "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b",
    "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b",
    "status": "done",
    "created_at": "2023-04-11 13:52:32.423207",
    "updated_at": "2023-04-11 13:52:37.330020",
    "started_at": "2023-04-11 13:52:32.842861",
    "is_cancellable": false,
    "datasource": {
        "id": "t_c45d5ae6781b41278fcee365f5bc2d35",
        "name": "shopping_data"
    },
    "delete_condition": " event = 'search'",
    "rows_affected": 100
}

Data Source engines supported

Tinybird uses ClickHouse as the underlying storage technology. ClickHouse features different strategies to store data, these different strategies define not only where and how the data is stored but what kind of data access, queries, and availability your data has. In ClickHouse terms, a Tinybird Data Source uses a Table Engine that determines those factors.

Currently, Tinybird supports deleting data for data sources with the following Engines:

  • MergeTree

  • ReplacingMergeTree

  • SummingMergeTree

  • AggregatingMergeTree

  • CollapsingMergeTree

  • VersionedCollapsingMergeTree

Dependent views deletion

If the Data Source has dependent Materialized Views, those won’t be cascade deleted. In case you want to delete data from other dependent Materialized Views, you’ll have to do a subsequent call to this method for the affected view with a proper delete_condition. This applies as well to the associated quarantine Data Source.

Request parameters

KEY

TYPE

DESCRIPTION

delete_condition

String

Mandatory. A string representing the WHERE SQL clause you’d add to a regular DELETE FROM <table> WHERE <delete_condition> statement. Most of the times you might want to write a simple delete_condition such as column_name=value but any valid SQL statement including conditional operators is valid

dry_run

String

Default: false. It allows you to test the deletion. When using true it will execute all deletion validations and return number of matched rows_to_be_deleted.

GET /v0/datasources/(.+)
Getting information about a particular Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:READ token>" \
-X GET "https://api.tinybird.co/v0/datasources/datasource_name"

Get Data Source information and stats. The token provided must have read access to the Data Source.

Successful response
{
    "id": "t_bd1c62b5e67142bd9bf9a7f113a2b6ea",
    "name": "datasource_name",
    "statistics": {
        "bytes": 430833,
        "row_count": 3980
    },
    "used_by": [{
        "id": "t_efdc62b5e67142bd9bf9a7f113a34353",
        "name": "pipe_using_datasource_name"
    }]
    "updated_at": "2018-09-07 23:50:32.322461",
    "created_at": "2018-11-28 23:50:32.322461",
    "type": "csv"
}
Request parameters

Key

Type

Description

attrs

String

comma separated list of the Data Source attributes to return in the response. Example: attrs=name,id,engine. Leave empty to return a full response

id and name are two ways to refer to the Data Source in SQL queries and API endpoints. The only difference is that the id never changes; it will work even if you change the name (which is the name used to display the Data Source in the UI). In general you can use id or name indistinctively:

Using the above response as an example:

select count(1) from events_table

is equivalent to

select count(1) from t_bd1c62b5e67142bd9bf9a7f113a2b6ea

The id t_bd1c62b5e67142bd9bf9a7f113a2b6ea is not a descriptive name so you can add a description like t_my_events_datasource.bd1c62b5e67142bd9bf9a7f113a2b6ea

The statistics property contains information about the table. Those numbers are an estimation: bytes is the estimated data size on disk and row_count the estimated number of rows. These statistics are updated whenever data is appended to the Data Source.

The used_by property contains the list of pipes that are using this data source. Only Pipe id and name are sent.

The type property indicates the format used when the Data Source was created. Available formats are csv, ndjson, and parquet. The Data Source type indicates what file format you can use to ingest data.

DELETE /v0/datasources/(.+)
Dropping a Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:DROP token>" \
-X DELETE "https://api.tinybird.co/v0/datasources/:name"

Drops a Data Source from your account.

Request parameters

Key

Type

Description

force

String

Default: false . The force parameter is taken into account when trying to delete Materialized Views. By default, when using false the deletion will not be carried out; you can enable it by setting it to true. If the given Data Source is being used as the trigger of a Materialized Node, it will not be deleted in any case.

dry_run

String

Default: false. It allows you to test the deletion. When using true it will execute all deletion validations and return the possible affected materializations and other dependencies of a given Data Source.

token

String

Auth token. Only required if no Bearer Authorization header is sent. It must have DROP:datasource_name scope for the given Data Source.

PUT /v0/datasources/(.+)

Update Data Source attributes

Updating the name of a Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X PUT "https://api.tinybird.co/v0/datasources/:name?name=new_name"
Promoting a Data Source to a Snowflake one
curl \
    -H "Authorization: Bearer <DATASOURCES:CREATE token>" \
    -X PUT "https://api.tinybird.co/v0/datasources/:name" \
    -d "connector=1d8232bf-2254-4d68-beff-4dd9aa505ab0" \
    -d "service=snowflake" \
    -d "cron=*/30 * * * *" \
    -d "query=select a, b, c from test" \
    -d "mode=replace" \
    -d "external_data_source=database.schema.table" \
    -d "ingest_now=True" \
Request parameters

Key

Type

Description

name

String

new name for the Data Source

token

String

Auth token. Only required if no Bearer Authorization header is sent. It should have DATASOURCES:CREATE scope for the given Data Source

connector

String

Connector ID to link it to

service

String

Type of service to promote it to. Only ‘snowflake’ or ‘bigquery’ allowed

cron

String

Cron-like pattern to execute the connector’s job

query

String

Optional: custom query to collect from the external data source

mode

String

Only replace is allowed for connectors

external_data_source

String

External data source to use for Snowflake

ingest_now

Boolean

To ingest the data immediately instead of waiting for the first execution determined by cron