Datasource API - Importing Data and Managing your Datasources

The Datasource API enables you to create and manage your Datasources as well as importing Data into them.

In order to use the Datasource API, you must use an Auth token with the right permissions depending on whether you want to CREATE, APPEND, READ or DROP (or a combination of those)

Importing data into Tinybird Analytics

Tinybird Analytics is specifically designed to ingest, proccess and analyze data in CSV format. CSV files must have one line for each row of data and have comma-separated fields, with the column headers in the first row.

You can import your data into Tinybird Analytics by creating a new Datasource. Tinybird Analytics will automatically detect and optimize your column types so you don’t have to worry about anything and can start analyzing your data right away.

Creating a datasource from a remote file
curl \
-H "Authorization: Bearer <import_token>" \
-X POST "https://api.tinybird.co/v0/datasources?url=https://s3.amazonaws.com/nyc-tlc/trip+data/fhv_tripdata_2018-12.csv"

Advanced users can explicitly analyze their data before importing.

If you are looking for information on how to create a new Pipe, take a look at the Pipe API reference.

GET /v0/datasources/?
get a list of your datasources
curl \
-H "Authorization: Bearer <token>" \
-X GET "https://api.tinybird.co/v0/datasources"

Get a list of the datasources in your account.

The token you use to query the available Datasources will determine what Datasources get returned: only those accessible with the token you are using will be returned in the response.

Succesfull response
{
    "datasources": [{
        "id": "datasource_id",
        "name": "datasource_name",
        "stats": {
            "bytes": 430833,
            "row_count": 3980
        }
    }]
}

Note that stats might be empty depending on how the datasource was created

POST /v0/datasources/?
Creating a datasource from a remote file
curl \
-H "Authorization: Bearer <token>" \
-X POST "https://api.tinybird.co/v0/datasources?url=http://example.com/file.csv"

You can create datasources by specifying the URL of a CSV file. The file should be remotely accesible. If the server supports HTTP Range headers the import process will be parallelized.

To create an empty Datasource, you must pass a schema with your desired column names and types and leave the url parameter empty.

The Datasource schema and sharding key are guessed from the csv. However, you can also force the schema. There are two cases where setting the schema is the best option:

  • For optimization purposes. For example, you know an integer is a 32 bit one instead of a 64 bits one.

  • When Tinybird’s guessing fails. We do our best, but sometimes we get it wrong!

If you want to know what schema Tinybird will attempt to use for a given CSV file, try analyzing it by performing a dry run of the import. You can do that by passing a dry=true parameter.

Imports do not stop when Tinybird finds rows that do not match the Datasource schema; instead, those rows are stored into a “quarantine” Datasource. This “quarantine” Datasource is automatically created along with each Datasource and it contains the same columns as the original Datasource but with String as the data type. Those records can be processed later to be recovered.

Request parameters

KEY

TYPE

DESCRIPTION

mode

String

Default: create. Other modes: append, replace. The default create mode creates a new datasource and attempts to import the data of the CSV if a url is provided. The append mode inserts the new rows provided into an existing datasource (it will also create it if it does not yet exist). replace will remove the previous Data Source and its data and replace it with the new one; Pipes or queries pointing to this Data Source will immediately start returning data from the new one and without disruption once the replace operation is complete. create will automatically name the datasource if no name parameter is provided; for the append mode to work, the name parameter must be provided and the schema must be compatible.

name

String

Optional. Name of the Datasource being created or where data is to be appended. This parameter is mandatory when using the append mode.

schema

String

Optional. Datasource schema in the format ‘column_name Type, column_name_2 Type2…’

engine

String

Optional. Engine for the underlying data. Requires the schema parameter.

url

String

Optional. The Url of the CSV with the data to be imported

progress

String

Default: false. When true Tinybird will return block status while loading using Line-delimited JSON

dry

String

Default: false. Analyzes how the data will get imported and returns the suggested schema and other information without actually creating a new datasource.

token

String

Auth token with create or append permissions. Required only if no Bearer Authorization header is found

Successfull response
{
  "job": "<job_id>",
  "job_url": "https://api.tinybird.co/api/v0/jobs/<job_id>"
}

Note that when importing a CSV via a URL, the response will not be the final result of the import but a job_id. You can check the job status and progress at https://api.tinybird.co/jobs/<job_id>

You can also use this endpoint to create a Datasource from a local file. Note that you won’t need to check your job status when doing so.

Creating a datasource from local files
curl \
-H "Authorization: Bearer <token>" \
-F csv=@local_file.csv "https://api.tinybird.co/v0/datasources"

Analyzing your data before importing it

You can also use this endpoint to analyze how the file will be processed without actually creating the new Datasource. You just need to add the dry=true parameter to your request. The Auth token must contain the DATASOURCES:CREATE scope.

Analyzing remote files prior importing
curl \
-H "Authorization: Bearer <token>" \
-X POST "https://api.tinybird.co/v0/datasources?dry=true&url=http://example.com/file.csv"

You can also analyze a local file.

Analyzing local files prior importing
head -n 2000 file.csv | \
curl --data-binary @- "https://api.tinybird.co/v0/datasources?dry=true"

As a response you will get three things:

  • The data schema that will be used by default.

  • Sharding and other data distribution keys.

  • CSV dialect and encoding.

Succesfull response
{
  "sql_schema": "VendorID Integer,tpep_pickup_datetime DateTime",
  "dialect": {
    "new_line": "\r",
    "has_header": 1,
    "delimiter": ","
  },
  "data_distribution": {
    "sampling": "intHash32(`payment_type`)",
    "sharding_key": "toYYYYMM(`tpep_pickup_datetime`)"
  },
  "encoding": "utf-8",
  "schema": [
    {
      "auto": false,
      "nullable": false,
      "type": "Integer",
      "normalized_name": "VendorID",
      "name": "VendorID"
    },
    {
      "auto": false,
      "nullable": false,
      "type": "DateTime",
      "normalized_name": "tpep_pickup_datetime",
      "name": "tpep_pickup_datetime"
    }
   ]
}
GET /v0/datasources/(.+)
Get information about a particular datasource
curl \
-H "Authorization: Bearer <token>" \
-X GET "https://api.tinybird.co/v0/datasources/datasource_name"

Get datasource information and stats. The token provided must have read access to the datasource.

Succesfull response
{
    "id": "t_bd1c62b5e67142bd9bf9a7f113a2b6ea",
    "name": "datasource_name",
    "statistics": {
        "bytes": 430833,
        "row_count": 3980
    },
    "used_by": [{
        "id": "t_efdc62b5e67142bd9bf9a7f113a34353",
        "name": "pipe_using_datasource_name"
    }]
    "updated_at": "2018-09-07 23:50:32.322461",
    "created_at": "2018-11-28 23:50:32.322461"
}

id and name are two ways to refer to the datasource in SQL queries and API endpoints. The only difference is that the id never changes; it will work even if you change the name (which is the name used to display the datasource in the UI). In general you can use id or name indistinctively:

Using the above response as an example:

select count(1) from events_table

is equivalent to

select count(1) from t_bd1c62b5e67142bd9bf9a7f113a2b6ea

The id t_bd1c62b5e67142bd9bf9a7f113a2b6ea is not a descriptive name so you can add a description like t_my_events_datasource.bd1c62b5e67142bd9bf9a7f113a2b6ea

The statistics property contains information about the table. Those numbers are an estimation: bytes is the estimated data size on disk and row_count the estimated number of rows. These statistics are updated whenever data is appended to the datasource.

The used_by property contains the list of Pipes that are using this datasource. Only Pipe id and name are sent.

DELETE /v0/datasources/(.+)
Drops a datasource
curl \
-H "Authorization: Bearer <token>" \ 
-X DELETE "https://api.tinybird.co/v0/datasources/name"

Drops a datasource from your account. Auth token in use must have the DROP:datasource_name scope.

PUT /v0/datasources/(.+)

Update Datasource attributes

Updating the name of a Datasource
curl \
-H "Authorization: Bearer <token>" \
-X PUT "https://api.tinybird.co/v0/datasources/:name?name=new_name"
Request parameters

Key

Type

Description

name

String

name of the Datasource

token

String

Auth token. Only required if no Bearer Authorization header is sent. It should have DATASOURCES:CREATE scope for the given Datasource.