Data Source API ¶

Use the Data Source API to list, create, update or delete your Tinybird Data Sources. You can also insert or delete data from Data Sources.

Adding or modifying pipes or data sources in Tinybird Local and Tinybird Forward can only be done through deployments.

All endpoints require authentication using a Token with the appropriate scope.

POST /v0/datasources/?¶

This endpoint supports 3 modes to enable 3 distinct operations, depending on the parameters provided:

Create a new Data Source with a schema

Append data to an existing Data Source

Replace data in an existing Data Source

The mode is controlled by setting the mode parameter, for example, -d "mode=create". Each mode has different rate limits.

When importing remote files by URL, if the server hosting the remote file supports HTTP Range headers, the import process will be parallelized.

Request parameters¶
KEY	TYPE	DESCRIPTION
mode	String	Default: `create`. Other modes: `append` and `replace`. The `create` mode creates a new Data Source and attempts to import the data of the CSV if a URL is provided or the body contains any data. The `append` mode inserts the new rows provided into an existing Data Source (it will also create it if it does not exist yet). The `replace` mode will remove the previous Data Source and its data and replace it with the new one; Pipes or queries pointing to this Data Source will immediately start returning data from the new one and without disruption once the replace operation is complete. The `create` mode will automatically name the Data Source if no `name` parameter is provided; for the `append` and `replace` modes to work, the `name` parameter must be provided and the schema must be compatible.
name	String	Optional. Name of the Data Source to create, append or replace data. This parameter is mandatory when using the `append` or `replace` modes.
url	String	Optional. The URL of the CSV with the data to be imported
dialect_delimiter	String	Optional. The one-character string separating the fields. We try to guess the delimiter based on the CSV contents using some statistics, but sometimes we fail to identify the correct one. If you know your CSV’s field delimiter, you can use this parameter to explicitly define it.
dialect_new_line	String	Optional. The one- or two-character string separating the records. We try to guess the delimiter based on the CSV contents using some statistics, but sometimes we fail to identify the correct one. If you know your CSV’s record delimiter, you can use this parameter to explicitly define it.
dialect_escapechar	String	Optional. The escapechar removes any special meaning from the following character. This is useful if the CSV does not use double quotes to encapsulate a column but uses double quotes in the content of a column and it is escaped with, e.g. a backslash.
schema	String	Optional. Data Source schema in the format ‘column_name Type, column_name_2 Type2…’. When creating a Data Source with format `ndjson` the `schema` must include the `jsonpath` for each column, see the `JSONPaths` section for more details.
engine	String	Optional. Engine for the underlying data. Requires the `schema` parameter.
engine_*	String	Optional. Engine parameters and options, check the Engines section for more details
progress	String	Default: `false`. When using `true` and sending the data in the request body, Tinybird will return block status while loading using Line-delimited JSON.
token	String	Auth token with create or append permissions. Required only if no Bearer Authorization header is found
type_guessing	String	Default: `true` The `type_guessing` parameter is not taken into account when replacing or appending data to an existing Data Source. When using `false` all columns are created as `String` otherwise it tries to guess the column types based on the CSV contents. Sometimes you are not familiar with the data and the first step is to get familiar with it: by disabling the type guessing, we enable you to quickly import everything as strings that you can explore with SQL and cast to the right type or shape in whatever way you see fit via a Pipe.
debug	String	Optional. Enables returning debug information from logs. It can include `blocks`, `block_log` and/or `hook_log`
replace_condition	String	Optional. When used in combination with the `replace` mode it allows you to replace a portion of your Data Source that matches the `replace_condition` SQL statement with the contents of the `url` or query passed as a parameter. See this guide to learn more.
replace_truncate_when_empty	Boolean	Optional. When used in combination with the `replace` mode it allows truncating the Data Source when empty data is provided. Not supported when `replace_condition` is specified
format	String	Default: `csv`. Indicates the format of the data to be ingested in the Data Source. By default is `csv` and you should specify `format=ndjson` for NDJSON format, and `format=parquet` for Parquet files.

Examples

Creating a CSV Data Source from a schema¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "schema=symbol String, date Date, close Float32"

Creating a CSV Data Source from a local CSV file with schema inference¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources?name=stocks" \
-F csv=@local_file.csv

Creating a CSV Data Source from a remote CSV file with schema inference¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d url='https://.../data.csv'

Creating an empty Data Source with a ReplacingMergeTree engine and custom engine settings¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "schema=pk UInt64, insert_date Date, close Float32" \
-d "engine=ReplacingMergeTree" \
-d "engine_sorting_key=pk" \
-d "engine_ver=insert_date" \
-d "name=test123" \
-d "engine_settings=index_granularity=2048, ttl_only_drop_parts=false"

Appending data to a Data Source from a local CSV file¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources?name=data_source_name&mode=append" \
-F csv=@local_file.csv

Appending data to a Data Source from a remote CSV file¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d mode='append' \
-d name='data_source_name' \
-d url='https://.../data.csv'

Replacing data with a local file¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources?name=data_source_name&mode=replace" \
-F csv=@local_file.csv

Replacing data with a remote file from a URL¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d mode='replace' \
-d name='data_source_name' \
--data-urlencode "url=http://example.com/file.csv"

GET /v0/datasources/?¶

getting a list of your Data Sources¶

curl \
-H "Authorization: Bearer <DATASOURCES:READ token>" \
-X GET "https://api.tinybird.co/v0/datasources"

Get a list of the Data Sources in your account.

The token you use to query the available Data Sources will determine what Data Sources get returned: only those accessible with the token you are using will be returned in the response.

Successful response¶

{
    "datasources": [{
        "id": "t_a049eb516ef743d5ba3bbe5e5749433a",
        "name": "your_datasource_name",
        "cluster": "tinybird",
        "tags": {},
        "created_at": "2019-11-13 13:53:05.340975",
        "updated_at": "2022-02-11 13:11:19.464343",
        "replicated": true,
        "version": 0,
        "project": null,
        "headers": {},
        "shared_with": [
            "89496c21-2bfe-4775-a6e8-97f1909c8fff"
        ],
        "engine": {
            "engine": "MergeTree",
            "engine_sorting_key": "example_column_1",
            "engine_partition_key": "",
            "engine_primary_key": "example_column_1"
        },
        "description": "",
        "used_by": [],
        "type": "csv",
        "columns": [{
                "name": "example_column_1",
                "type": "Date",
                "codec": null,
                "default_value": null,
                "jsonpath": null,
                "nullable": false,
                "normalized_name": "example_column_1"
            },
            {
                "name": "example_column_2",
                "type": "String",
                "codec": null,
                "default_value": null,
                "jsonpath": null,
                "nullable": false,
                "normalized_name": "example_column_2"
            }
        ],
        "statistics": {
            "bytes": 77822,
            "row_count": 226188
        },
        "new_columns_detected": {},
        "quarantine_rows": 0
    }]
}

Request parameters¶
Key	Type	Description
attrs	String	comma separated list of the Data Source attributes to return in the response. Example: `attrs=name,id,engine`. Leave empty to return a full response

Note that the statistics’s bytes and row_count attributes might be null depending on how the Data Source was created.

POST /v0/datasources/(.+)/alter¶

Modify the Data Source schema.

This endpoint supports the operation to alter the following fields of a Data Source:

Request parameters¶
Key	Type	Description
schema	String	Optional. Set the whole schema that adds new columns to the existing ones of a Data Source.
description	String	Optional. Sets the description of the Data Source.
kafka_store_raw_value	Boolean	Optional. Default: false. When set to true, the ‘value’ column of a Kafka Data Source will save the JSON as a raw string.
kafka_store_headers	Boolean	Optional. Default: false. When set to true, the ‘headers’ of a Kafka Data Source will be saved as a binary map.
ttl	String	Optional. Set to any value accepted in ClickHouse for a TTL or to ‘false’ to remove the TTL.
dry	Boolean	Optional. Default: false. Set to true to show what would be modified in the Data Source, without running any modification at all.

The schema parameter can be used to add new columns at the end of the existing ones in a Data Source.

Be aware that currently we don’t validate if the change will affect the existing MVs (Materialized Views) attached to the Data Source to be modified, so this change may break existing MVs. For example, avoid changing a Data Source that has a MV created with something like SELECT * FROM Data Source .... If you want to have forward compatible MVs with column additions, create them especifying the columns instead of using the * operator.

Also, take in account that, for now, the only engines supporting adding new columns are those inside the MergeTree family.

To add a column to a Data Source, call this endpoint with the Data Source name and the new schema definition.

For example, having a Data Source created like this:

Creating a Data Source from a schema¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "schema=symbol String, date Date, close Float32"

if you want to add a new column ‘concept String’, you need to call this endpoint with the new schema:

Adding a new column to an existing Data Source¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \
-d "schema=symbol String, date Date, close Float32, concept String"

If everything went ok, you will get the operations done in the response:

ADD COLUMN operation resulted from the schema change.¶

{
    "operations": [
        "ADD COLUMN `concept` String"
    ]
}

You can also view the inferred operations without executing them adding dry=true in the parameters.

To modify the description of a Data Source:

Modifying the description a Data Source¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \
-d "name=stocks" \
-d "description=My new description"

To save in the “value” column of a Kafka Data Source the JSON as a raw string:

Saving the raw string in the value column of a Kafka Data Source¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \
-d "name=stocks" \
-d "kafka_store_raw_value=true"
-d "kafka_store_headers=true"

To modify the TTL of a Data Source:

Modifying the TTL of a Data Source¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \
-d "name=stocks" \
-d "ttl=12 hours"

To remove the TTL of a Data Source:

Modifying the TTL of a Data Source¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "ttl=false"

To add default values to the columns of a Data Source:

Modifying default values¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "schema=symbol String DEFAULT '-', date Date DEFAULT now(), close Float32 DEFAULT 1.1"

To add default values to the columns of a NDJSON Data Source, add the default definition after the jsonpath definition:

Modifying default values in a NDJSON Data Source¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "schema=symbol String `json:$.symbol` DEFAULT '-', date Date `json:$.date` DEFAULT now(), close `json:$.close` Float32 DEFAULT 1.1"

To make a column nullable, change the type of the column adding the Nullable type prefix to old one:

Converting column “close” to Nullable¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "schema=symbol String `json:$.symbol, date Date `json:$.date`, close `json:$.close` Nullable(Float32)"

To drop a column, just remove the column from the schema definition. It will not be possible removing columns that are part of the primary or partition key:

Remove column “close” from the Data Source¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "schema=symbol String `json:$.symbol, date Date `json:$.date`"

You can also alter the JSONPaths of existing Data Sources. In that case you have to specify the JSONPath in the schema in the same way as when you created the Data Source.

POST /v0/datasources/(.+)/truncate¶

Truncates a Data Source in your account. If the Data Source has dependent Materialized Views, those won’t be truncated in cascade. In case you want to delete data from other dependent Materialized Views, you’ll have to do a subsequent call to this method. Auth token in use must have the DATASOURCES:CREATE scope.

Truncating a Data Source¶

curl \
    -H "Authorization: Bearer <DATASOURCES:CREATE token>" \
    -X POST "https://api.tinybird.co/v0/datasources/name/truncate"

This works as well for the quarantine table of a Data Source. Remember that the quarantine table for a Data Source has the same name but with the “_quarantine” suffix.

Truncating the quarantine table from a Data Source¶

curl \
    -H "Authorization: Bearer <DATASOURCES:DROP token>" \
    -X POST "https://api.tinybird.co/v0/datasources/:name_quarantine/truncate"

POST /v0/datasources/(.+)/delete¶

Deletes rows from a Data Source in your account given a SQL condition. Auth token in use must have the DATASOURCES:CREATE scope.

Deleting rows from a Data Source given a SQL condition¶

curl \
    -H "Authorization: Bearer <DATASOURCES:CREATE token>" \
    --data "delete_condition=(country='ES')" \
    "https://api.tinybird.co/v0/datasources/:name/delete"

When deleting rows from a Data Source, the response will not be the final result of the deletion but a Job. You can check the job status and progress using the Jobs API. In the response, id, job_id, and delete_id should have the same value:

Delete API Response¶

{
    "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b",
    "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b",
    "job_url": "https://api.tinybird.co/v0/jobs/64e5f541-xxxx-xxxx-xxxx-00524051861b",
    "job": {
        "kind": "delete_data",
        "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b",
        "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b",
        "status": "waiting",
        "created_at": "2023-04-11 13:52:32.423207",
        "updated_at": "2023-04-11 13:52:32.423213",
        "started_at": null,
        "is_cancellable": true,
        "datasource": {
            "id": "t_c45d5ae6781b41278fcee365f5bxxxxx",
            "name": "shopping_data"
        },
        "delete_condition": "event = 'search'"
    },
    "status": "waiting",
    "delete_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b"
}

To check on the progress of the delete job, use the job_id from the Delete API response to query the Jobs API.

For example, to check on the status of the above delete job:

checking the status of the delete job¶

curl \
    -H "Authorization: Bearer <TOKEN>" \
    https://api.tinybird.co/v0/jobs/64e5f541-xxxx-xxxx-xxxx-00524051861b

Would respond with:

Job API Response¶

{
    "kind": "delete_data",
    "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b",
    "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b",
    "status": "done",
    "created_at": "2023-04-11 13:52:32.423207",
    "updated_at": "2023-04-11 13:52:37.330020",
    "started_at": "2023-04-11 13:52:32.842861",
    "is_cancellable": false,
    "datasource": {
        "id": "t_c45d5ae6781b41278fcee365f5bc2d35",
        "name": "shopping_data"
    },
    "delete_condition": " event = 'search'",
    "rows_affected": 100
}

Data Source engines supported

Tinybird uses ClickHouse as the underlying storage technology. ClickHouse features different strategies to store data, these different strategies define not only where and how the data is stored but what kind of data access, queries, and availability your data has. In ClickHouse terms, a Tinybird Data Source uses a Table Engine that determines those factors.

Currently, Tinybird supports deleting data for data sources with the following Engines:

MergeTree
ReplacingMergeTree
SummingMergeTree
AggregatingMergeTree
CollapsingMergeTree
VersionedCollapsingMergeTree

Dependent views deletion

If the Data Source has dependent Materialized Views, those won’t be cascade deleted. In case you want to delete data from other dependent Materialized Views, you’ll have to do a subsequent call to this method for the affected view with a proper delete_condition. This applies as well to the associated quarantine Data Source.

Request parameters¶
KEY	TYPE	DESCRIPTION
delete_condition	String	Mandatory. A string representing the WHERE SQL clause you’d add to a regular DELETE FROM <table> WHERE <delete_condition> statement. Most of the times you might want to write a simple `delete_condition` such as `column_name=value` but any valid SQL statement including conditional operators is valid
dry_run	String	Default: `false`. It allows you to test the deletion. When using `true` it will execute all deletion validations and return number of matched `rows_to_be_deleted`.

GET /v0/datasources/(.+)¶

Getting information about a particular Data Source¶

curl \
-H "Authorization: Bearer <DATASOURCES:READ token>" \
-X GET "https://api.tinybird.co/v0/datasources/datasource_name"

Get Data Source information and stats. The token provided must have read access to the Data Source.

Successful response¶

{
    "id": "t_bd1c62b5e67142bd9bf9a7f113a2b6ea",
    "name": "datasource_name",
    "statistics": {
        "bytes": 430833,
        "row_count": 3980
    },
    "used_by": [{
        "id": "t_efdc62b5e67142bd9bf9a7f113a34353",
        "name": "pipe_using_datasource_name"
    }]
    "updated_at": "2018-09-07 23:50:32.322461",
    "created_at": "2018-11-28 23:50:32.322461",
    "type": "csv"
}

Request parameters¶
Key	Type	Description
attrs	String	comma separated list of the Data Source attributes to return in the response. Example: `attrs=name,id,engine`. Leave empty to return a full response

id and name are two ways to refer to the Data Source in SQL queries and API endpoints. The only difference is that the id never changes; it will work even if you change the name (which is the name used to display the Data Source in the UI). In general you can use id or name indistinctively:

Using the above response as an example:

select count(1) from events_table

is equivalent to

select count(1) from t_bd1c62b5e67142bd9bf9a7f113a2b6ea

The id t_bd1c62b5e67142bd9bf9a7f113a2b6ea is not a descriptive name so you can add a description like t_my_events_datasource.bd1c62b5e67142bd9bf9a7f113a2b6ea

The statistics property contains information about the table. Those numbers are an estimation: bytes is the estimated data size on disk and row_count the estimated number of rows. These statistics are updated whenever data is appended to the Data Source.

The used_by property contains the list of pipes that are using this data source. Only Pipe id and name are sent.

The type property indicates the format used when the Data Source was created. Available formats are csv, ndjson, and parquet. The Data Source type indicates what file format you can use to ingest data.

DELETE /v0/datasources/(.+)¶

Dropping a Data Source¶

curl \
-H "Authorization: Bearer <DATASOURCES:DROP token>" \
-X DELETE "https://api.tinybird.co/v0/datasources/:name"

Drops a Data Source from your account.

Request parameters¶
Key	Type	Description
force	String	Default: `false` . The `force` parameter is taken into account when trying to delete Materialized Views. By default, when using `false` the deletion will not be carried out; you can enable it by setting it to `true`. If the given Data Source is being used as the trigger of a Materialized Node, it will not be deleted in any case.
dry_run	String	Default: `false`. It allows you to test the deletion. When using `true` it will execute all deletion validations and return the possible affected materializations and other dependencies of a given Data Source.
token	String	Auth token. Only required if no Bearer Authorization header is sent. It must have `DROP:datasource_name` scope for the given Data Source.

PUT /v0/datasources/(.+)¶

Update Data Source attributes

Updating the name of a Data Source¶

curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X PUT "https://api.tinybird.co/v0/datasources/:name?name=new_name"

Promoting a Data Source to a Snowflake one¶

curl \
    -H "Authorization: Bearer <DATASOURCES:CREATE token>" \
    -X PUT "https://api.tinybird.co/v0/datasources/:name" \
    -d "connector=1d8232bf-2254-4d68-beff-4dd9aa505ab0" \
    -d "service=snowflake" \
    -d "cron=*/30 * * * *" \
    -d "query=select a, b, c from test" \
    -d "mode=replace" \
    -d "external_data_source=database.schema.table" \
    -d "ingest_now=True" \

Request parameters¶
Key	Type	Description
name	String	new name for the Data Source
token	String	Auth token. Only required if no Bearer Authorization header is sent. It should have `DATASOURCES:CREATE` scope for the given Data Source
connector	String	Connector ID to link it to
service	String	Type of service to promote it to. Only ‘snowflake’ or ‘bigquery’ allowed
cron	String	Cron-like pattern to execute the connector’s job
query	String	Optional: custom query to collect from the external data source
mode	String	Only replace is allowed for connectors
external_data_source	String	External data source to use for Snowflake
ingest_now	Boolean	To ingest the data immediately instead of waiting for the first execution determined by cron