Datasource API - Importing Data and Managing your Datasources¶
The Datasource API enables you to create and manage your Data Sources as well as importing Data into them.
In order to use the Datasource API, you must use an Auth token with the right permissions depending on whether you want to CREATE
, APPEND
, READ
or DROP
(or a combination of those)
Importing data into Tinybird Analytics
Tinybird Analytics is specifically designed to ingest, proccess and analyze data in CSV format. CSV files must have one line for each row of data and have comma-separated fields, with the column headers in the first row.
You can import your data into Tinybird Analytics by creating a new Datasource. Tinybird Analytics will automatically detect and optimize your column types so you don’t have to worry about anything and can start analyzing your data right away.
curl \
-H "Authorization: Bearer <import_token>" \
-X POST "https://api.tinybird.co/v0/datasources?url=https://s3.amazonaws.com/nyc-tlc/trip+data/fhv_tripdata_2018-12.csv"
Advanced users can explicitly analyze their data before importing.
If you are looking for information on how to create a new Pipe, take a look at the Pipe API reference.
-
GET
/v0/datasources/?
¶ - get a list of your datasources¶
curl \ -H "Authorization: Bearer <DATASOURCES:READ token>" \ -X GET "https://api.tinybird.co/v0/datasources"
Get a list of the datasources in your account.
The token you use to query the available Datasources will determine what Datasources get returned: only those accessible with the token you are using will be returned in the response.
Succesfull response¶{ "datasources": [{ "id": "datasource_id", "name": "datasource_name", "stats": { "bytes": 430833, "row_count": 3980 } }] }
Note that
stats
might be empty depending on how the datasource was created
-
POST
/v0/datasources/?
¶ - Creating a datasource from a remote file¶
curl \ -H "Authorization: Bearer <import_token>" \ -X POST "https://api.tinybird.co/v0/datasources?url=http://example.com/file.csv"
You can create datasources by specifying the URL of a CSV file. The file should be remotely accesible. If the server supports HTTP Range headers the import process will be parallelized.
To create an empty Datasource, you must pass a
schema
with your desired column names and types and leave theurl
parameter empty.The Datasource schema and sharding key are guessed from the csv. However, you can also force the schema. There are two cases where setting the schema is the best option:
For optimization purposes. For example, you know an integer is a 32 bit one instead of a 64 bits one.
When Tinybird’s guessing fails. We do our best, but sometimes we get it wrong!
If you want to know what schema Tinybird will attempt to use for a given CSV file, try analyzing it by performing a dry run of the import. You can do that by passing a
dry=true
parameter.Imports do not stop when Tinybird finds rows that do not match the Datasource schema; instead, those rows are stored into a “quarantine” Datasource. This “quarantine” Datasource is automatically created along with each Datasource and it contains the same columns as the original Datasource but with
String
as the data type. Those records can be processed later to be recovered.Request parameters¶ KEY
TYPE
DESCRIPTION
mode
String
Default:
create
. Other modes:append
,replace
. The defaultcreate
mode creates a new datasource and attempts to import the data of the CSV if aurl
is provided. Theappend
mode inserts the new rows provided into an existing datasource (it will also create it if it does not yet exist).replace
will remove the previous Data Source and its data and replace it with the new one; Pipes or queries pointing to this Data Source will immediately start returning data from the new one and without disruption once thereplace
operation is complete.create
will automatically name the datasource if noname
parameter is provided; for theappend
mode to work, thename
parameter must be provided and the schema must be compatible.name
String
Optional. Name of the Datasource being created or where data is to be appended. This parameter is mandatory when using the
append
mode.schema
String
Optional. Datasource schema in the format ‘column_name Type, column_name_2 Type2…’
engine
String
Optional. Engine for the underlying data. Requires the
schema
parameter.url
String
Optional. The Url of the CSV with the data to be imported
progress
String
Default:
false
. Whentrue
Tinybird will return block status while loading using Line-delimited JSONdry
String
Default:
false
. Analyzes how the data will get imported and returns the suggested schema and other information without actually creating a new datasource.token
String
Auth token with create or append permissions. Required only if no Bearer Authorization header is found
Successfull response¶{ "job": "<job_id>", "job_url": "https://api.tinybird.co/api/v0/jobs/<job_id>" }
Note that when importing a CSV via a URL, the response will not be the final result of the import but a job_id. You can check the job status and progress at
https://api.tinybird.co/jobs/<job_id>
You can also use this endpoint to create a Datasource from a local file. Note that you won’t need to check your job status when doing so.
Creating a datasource from local files¶curl \ -H "Authorization: Bearer <import_token>" \ -F csv=@local_file.csv "https://api.tinybird.co/v0/datasources"
Analyzing your data before importing it
You can also use this endpoint to analyze how the file will be processed without actually creating the new Datasource. You just need to add the
dry=true
parameter to your request. The Auth token must contain theDATASOURCES:CREATE
scope.Analyzing remote files prior importing¶curl \ -H "Authorization: Bearer <import_token>" \ -X POST "https://api.tinybird.co/v0/datasources?dry=true&url=http://example.com/file.csv"
You can also analyze a local file.
Analyzing local files prior importing¶head -n 2000 file.csv | \ -H "Authorization: Bearer <import_token>" \ curl --data-binary @- "https://api.tinybird.co/v0/datasources?dry=true"
As a response you will get three things:
The data schema that will be used by default.
Sharding and other data distribution keys.
CSV dialect and encoding.
Succesfull response¶{ "sql_schema": "VendorID Integer,tpep_pickup_datetime DateTime", "dialect": { "new_line": "\r", "has_header": 1, "delimiter": "," }, "data_distribution": { "sampling": "intHash32(`payment_type`)", "sharding_key": "toYYYYMM(`tpep_pickup_datetime`)" }, "encoding": "utf-8", "schema": [ { "auto": false, "nullable": false, "type": "Integer", "normalized_name": "VendorID", "name": "VendorID" }, { "auto": false, "nullable": false, "type": "DateTime", "normalized_name": "tpep_pickup_datetime", "name": "tpep_pickup_datetime" } ] }
-
GET
/v0/datasources/(.+)
¶ - Get information about a particular datasource¶
curl \ -H "Authorization: Bearer <DATASOURCES:READ token>" \ -X GET "https://api.tinybird.co/v0/datasources/datasource_name"
Get datasource information and stats. The token provided must have read access to the datasource.
Succesfull response¶{ "id": "t_bd1c62b5e67142bd9bf9a7f113a2b6ea", "name": "datasource_name", "statistics": { "bytes": 430833, "row_count": 3980 }, "used_by": [{ "id": "t_efdc62b5e67142bd9bf9a7f113a34353", "name": "pipe_using_datasource_name" }] "updated_at": "2018-09-07 23:50:32.322461", "created_at": "2018-11-28 23:50:32.322461" }
id
andname
are two ways to refer to the datasource in SQL queries and API endpoints. The only difference is that theid
never changes; it will work even if you change thename
(which is the name used to display the datasource in the UI). In general you can useid
orname
indistinctively:Using the above response as an example:
select count(1) from events_table
is equivalent to
select count(1) from t_bd1c62b5e67142bd9bf9a7f113a2b6ea
The id
t_bd1c62b5e67142bd9bf9a7f113a2b6ea
is not a descriptive name so you can add a description liket_my_events_datasource.bd1c62b5e67142bd9bf9a7f113a2b6ea
The
statistics
property contains information about the table. Those numbers are an estimation:bytes
is the estimated data size on disk androw_count
the estimated number of rows. These statistics are updated whenever data is appended to the datasource.The
used_by
property contains the list of Pipes that are using this datasource. Only Pipeid
andname
are sent.
-
DELETE
/v0/datasources/(.+)
¶ - Drops a datasource¶
curl \ -H "Authorization: Bearer <DATASOURCES:DROP token>" \ -X DELETE "https://api.tinybird.co/v0/datasources/name"
Drops a datasource from your account. Auth token in use must have the
DROP:datasource_name
scope.
-
PUT
/v0/datasources/(.+)
¶ Update Datasource attributes
Updating the name of a Datasource¶curl \ -H "Authorization: Bearer <import_token>" \ -X PUT "https://api.tinybird.co/v0/datasources/:name?name=new_name"
Request parameters¶ Key
Type
Description
name
String
name of the Datasource
token
String
Auth token. Only required if no Bearer Authorization header is sent. It should have
DATASOURCES:CREATE
scope for the given Datasource.