Managing your Datasources

The Datasource API enables you to create and manage your Datasources. In order to use the Datasource API, you must use an Auth token with the tables management specific permissions.

Importing data into Tinybird

The Tinybird engine is specifically designed to ingest, proccess and analyze data in CSV format. CSV files must have one line for each row of data and have comma-separated fields, containing in the first row, the column headers.

You can import your data into Tinybird by creating a new Datasource. Tinybird will automatically detect and optimize your column types so you will not have to worry about anything but start analyzing your data. Advanced users should explicitly analyze their data before importing.

If you are looking to create a new Pipe, take a look at the Pipe API reference.

GET /v0/datasources/?
get a list of your datasources
curl -X GET "https://api.tinybird.co/v0/datasources"

Get a list of the datasources in your account.

Datasources in the response will be the ones that are accesible using a particular token with read permissions for them.

Succesfull response
{
    "datasources": [{
        "id": "datasource_id",
        "name": "datasource_name",
        "stats": {
            "bytes": 430833,
            "row_count": 3980
        }
    }]
}

Note that stats might be empty depending on the datasource source

POST /v0/datasources/?
Creating a datasource from a remote file
curl -X POST "https://api.tinybird.co/v0/datasources?url=http://example.com/file.csv"

You can create datasources from an HTTP endpoint where a CSV file is served. The file should be remotely accesible. If the server supports HTTP Range headers import process is done in parallel.

To create an empty Datasource, you must set schema with your desired structure and leave the url parameter empty.

The Datasource schema and sharding key are guessed from csv type but you could force the schema. There are two cases where setting the schema is the best option:

  • When you want to optimize. For example, you know a integer is a 32 bit one instead a 64 bits

  • When guessing fails. You could know what schema the import endpoint will use using the Import Analyze endpoint.

If any imported row does not match with the Datasource schema it’s sent to a “quarantine” Datasource. This “quarantine” Datasource is automatically created for all the Datasources and contains the same columns than the original Datasource but using String as the data type. Those records can be processed later to recover them.

Request parameters

KEY

TYPE

DESCRIPTION

mode

String

Default: create. Other modes: append. The default create mode produces a new datasource. The append mode inserts new data into an existing datasource. For the append mode the name parameter is mandatory and the schema must be compatible.

name

String

Optional. Given name to the imported Datasource. This parameter is mandatory when using the append mode.

schema

String

Optional. Datasource schema in format ‘column Type, column2 Type2…’

url

String

Optional. Url from where the data will be imported

progress

String

Default: false. If True it sends block status while loading using Line-delimited JSON

dry

String

Default: false. Analyzes how the data will get imported without actually creating a new datasource.

token

String

Auth token with create or append permissions

Successfull response
{
  "job": "<job_id>",
  "job_url": "https://api.tinybird.co/api/v0/jobs/<job_id>"
}

The response will not be the final result but a job_id. You can check the job status and progress at https://api.tinybird.co/jobs/<job_id>

You can also use this endpoint to create a Datasource from a local file. Note that you won’t need to check your job status when doing so.

Creating a datasource from local files
curl -F csv=@local_file.csv "https://api.tinybird.co/v0/datasources"

Analyzing your data before importing it

You can also use this endpoint to analyze how the file will be processed without actually creating the new Datasource. You just need to add the dry=true parameter to your request. The Auth token must contain the DATASOURCES:CREATE scope.

Analyzing remote files prior importing
curl -X POST "https://api.tinybird.co/v0/datasources?dry=true&url=http://example.com/file.csv"

You can also analyze a local file.

Analyzing local files prior importing
head -n 2000 file.csv | \
curl --data-binary @- "https://api.tinybird.co/v0/datasources?dry=true"

As a response you will get three things:

  • The data schema that will be used by default.

  • Sharding and other data distribution keys.

  • CSV dialect and encoding.

All those can be overriden when importing to fix some guessing issues or to force a specific schema (useful when optimizing storage or some especific queries).

Succesfull response
{
  "sql_schema": "VendorID Integer,tpep_pickup_datetime DateTime",
  "dialect": {
    "new_line": "\r",
    "has_header": 1,
    "delimiter": ","
  },
  "data_distribution": {
    "sampling": "intHash32(`payment_type`)",
    "sharding_key": "toYYYYMM(`tpep_pickup_datetime`)"
  },
  "encoding": "utf-8",
  "schema": [
    {
      "auto": false,
      "nullable": false,
      "type": "Integer",
      "normalized_name": "VendorID",
      "name": "VendorID"
    },
    {
      "auto": false,
      "nullable": false,
      "type": "DateTime",
      "normalized_name": "tpep_pickup_datetime",
      "name": "tpep_pickup_datetime"
    }
   ]
}
GET /v0/datasources/(.+)
Get information about a particular datasource
curl -X GET "https://api.tinybird.co/v0/datasources/datasource_name"

Get datasource information and stats. Provided token must have read access to the datasource.

Succesfull response
{
    "id": "t_bd1c62b5e67142bd9bf9a7f113a2b6ea",
    "name": "datasource_name",
    "stats": {
        "bytes": 430.8330078125,
        "row_count": 3980
    },
    "used_by": [{
        "id": "t_efdc62b5e67142bd9bf9a7f113a34353",
        "name": "pipe_using_datasource_name"
    }]
}

id and name are two ways to refer to the datasource in SQL queries and API endpoints the only difference is id never change so if it’ll work even if you change name (which is the name used to display the datasource). In general you can use id or name indistinctly:

select count(1) from events_table

is equivalent to

select count(1) from t_bd1c62b5e67142bd9bf9a7f113a2b6ea

t_bd1c62b5e67142bd9bf9a7f113a2b6ea is not a descriptive name so you can add a description like t_my_events_datasource.bd1c62b5e67142bd9bf9a7f113a2b6ea

used_by contain the list of Pipes that are using this datasource. Only Pipe id and name are sent.

DELETE /v0/datasources/(.+)
Drops a datasource
curl -X DELETE "https://api.tinybird.co/v0/datasources/name"

Drops a datasource from your account. Auth token in use must have the DROP:datasource_name scope.

PUT /v0/datasources/(.+)

Changes datasource attributes

changing name
curl -X PUT "https://api.tinybird.co/v0/datasources/:name?name=new_name"
Request parameters

Key

Type

Description

name

String

name of the table

token

String

Auth token. Ensure it has the DATASOURCES:CREATE scope on it

When there is another table with the same name an error is raised.