Data Source API ¶
Use the Data Source API to list, create, update or delete your Tinybird Data Sources. You can also insert or delete data from Data Sources. Ingestion limits apply.
Adding or modifying pipes or data sources in Tinybird Local and Tinybird Forward can only be done through deployments.
All endpoints require authentication using a Token with the appropriate scope.
- POST /v0/datasources/?¶
This endpoint supports 3 modes to enable 3 distinct operations, depending on the parameters provided:
Create a new Data Source with a schema
Append data to an existing Data Source
Replace data in an existing Data Source
The mode is controlled by setting the
modeparameter, for example,-d "mode=create". Each mode has different rate limits.When importing remote files by URL, if the server hosting the remote file supports HTTP Range headers, the import process will be parallelized.
Request parameters¶ KEY
TYPE
DESCRIPTION
mode
String
Default:
create. Other modes:appendandreplace.
Thecreatemode creates a new Data Source and attempts to import the data of the CSV if a URL is provided or the body contains any data.
Theappendmode inserts the new rows provided into an existing Data Source (it will also create it if it does not exist yet).
Thereplacemode will remove the previous Data Source and its data and replace it with the new one; Pipes or queries pointing to this Data Source will immediately start returning data from the new one and without disruption once the replace operation is complete.
Thecreatemode will automatically name the Data Source if nonameparameter is provided; for theappendandreplacemodes to work, thenameparameter must be provided and the schema must be compatible.name
String
Optional. Name of the Data Source to create, append or replace data. This parameter is mandatory when using the
appendorreplacemodes.url
String
Optional. The URL of the CSV with the data to be imported
dialect_delimiter
String
Optional. The one-character string separating the fields. We try to guess the delimiter based on the CSV contents using some statistics, but sometimes we fail to identify the correct one. If you know your CSV’s field delimiter, you can use this parameter to explicitly define it.
dialect_new_line
String
Optional. The one- or two-character string separating the records. We try to guess the delimiter based on the CSV contents using some statistics, but sometimes we fail to identify the correct one. If you know your CSV’s record delimiter, you can use this parameter to explicitly define it.
dialect_escapechar
String
Optional. The escapechar removes any special meaning from the following character. This is useful if the CSV does not use double quotes to encapsulate a column but uses double quotes in the content of a column and it is escaped with, e.g. a backslash.
schema
String
Optional. Data Source schema in the format ‘column_name Type, column_name_2 Type2…’. When creating a Data Source with format
ndjsontheschemamust include thejsonpathfor each column, see theJSONPathssection for more details.engine
String
Optional. Engine for the underlying data. Requires the
schemaparameter.engine_*
String
Optional. Engine parameters and options, check the Engines section for more details
progress
String
Default:
false. When usingtrueand sending the data in the request body, Tinybird will return block status while loading using Line-delimited JSON.token
String
Auth token with create or append permissions. Required only if no Bearer Authorization header is found
type_guessing
String
Default:
trueThetype_guessingparameter is not taken into account when replacing or appending data to an existing Data Source. When usingfalseall columns are created asStringotherwise it tries to guess the column types based on the CSV contents. Sometimes you are not familiar with the data and the first step is to get familiar with it: by disabling the type guessing, we enable you to quickly import everything as strings that you can explore with SQL and cast to the right type or shape in whatever way you see fit via a Pipe.debug
String
Optional. Enables returning debug information from logs. It can include
blocks,block_logand/orhook_logreplace_condition
String
Optional. When used in combination with the
replacemode it allows you to replace a portion of your Data Source that matches thereplace_conditionSQL statement with the contents of theurlor query passed as a parameter. See this guide to learn more.replace_truncate_when_empty
Boolean
Optional. When used in combination with the
replacemode it allows truncating the Data Source when empty data is provided. Not supported whenreplace_conditionis specifiedformat
String
Default:
csv. Indicates the format of the data to be ingested in the Data Source. By default iscsvand you should specifyformat=ndjsonfor NDJSON format, andformat=parquetfor Parquet files.Examples
Creating a CSV Data Source from a schema¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String, date Date, close Float32"
Creating a CSV Data Source from a local CSV file with schema inference¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources?name=stocks" \ -F csv=@local_file.csv
Creating a CSV Data Source from a remote CSV file with schema inference¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d url='https://.../data.csv'
Creating an empty Data Source with a ReplacingMergeTree engine and custom engine settings¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "schema=pk UInt64, insert_date Date, close Float32" \ -d "engine=ReplacingMergeTree" \ -d "engine_sorting_key=pk" \ -d "engine_ver=insert_date" \ -d "name=test123" \ -d "engine_settings=index_granularity=2048, ttl_only_drop_parts=false"
Appending data to a Data Source from a local CSV file¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources?name=data_source_name&mode=append" \ -F csv=@local_file.csv
Appending data to a Data Source from a remote CSV file¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d mode='append' \ -d name='data_source_name' \ -d url='https://.../data.csv'
Replacing data with a local file¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources?name=data_source_name&mode=replace" \ -F csv=@local_file.csv
Replacing data with a remote file from a URL¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d mode='replace' \ -d name='data_source_name' \ --data-urlencode "url=http://example.com/file.csv"
Importing a .log local file as NDJSON¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources?name=logs&format=ndjson" \ -F ndjson=@local_file.log
- GET /v0/datasources/?¶
- getting a list of your Data Sources¶
curl \ -H "Authorization: Bearer <DATASOURCES:READ token>" \ -X GET "https://api.tinybird.co/v0/datasources"
Get a list of the Data Sources in your account.
The token you use to query the available Data Sources will determine what Data Sources get returned: only those accessible with the token you are using will be returned in the response.
Successful response¶{ "datasources": [{ "id": "t_a049eb516ef743d5ba3bbe5e5749433a", "name": "your_datasource_name", "cluster": "tinybird", "tags": {}, "created_at": "2019-11-13 13:53:05.340975", "updated_at": "2022-02-11 13:11:19.464343", "replicated": true, "version": 0, "project": null, "headers": {}, "shared_with": [ "89496c21-2bfe-4775-a6e8-97f1909c8fff" ], "engine": { "engine": "MergeTree", "engine_sorting_key": "example_column_1", "engine_partition_key": "", "engine_primary_key": "example_column_1" }, "description": "", "used_by": [], "type": "csv", "columns": [{ "name": "example_column_1", "type": "Date", "codec": null, "default_value": null, "jsonpath": null, "nullable": false, "normalized_name": "example_column_1" }, { "name": "example_column_2", "type": "String", "codec": null, "default_value": null, "jsonpath": null, "nullable": false, "normalized_name": "example_column_2" } ], "statistics": { "bytes": 77822, "row_count": 226188 }, "new_columns_detected": {}, "quarantine_rows": 0 }] }
Request parameters¶ Key
Type
Description
attrs
String
comma separated list of the Data Source attributes to return in the response. Example:
attrs=name,id,engine. Leave empty to return a full responseNote that the
statistics’sbytesandrow_countattributes might benulldepending on how the Data Source was created.
- POST /v0/datasources/(.+)/alter¶
Modify the Data Source schema.
This endpoint supports the operation to alter the following fields of a Data Source:
Request parameters¶ Key
Type
Description
schema
String
Optional. Set the whole schema that adds new columns to the existing ones of a Data Source.
description
String
Optional. Sets the description of the Data Source.
kafka_store_raw_value
Boolean
Optional. Default: false. When set to true, the ‘value’ column of a Kafka Data Source will save the JSON as a raw string.
kafka_store_headers
Boolean
Optional. Default: false. When set to true, the ‘headers’ of a Kafka Data Source will be saved as a binary map.
ttl
String
Optional. Set to any value accepted in ClickHouse for a TTL or to ‘false’ to remove the TTL. See example below.
dry
Boolean
Optional. Default: false. Set to true to show what would be modified in the Data Source, without running any modification at all.
The schema parameter can be used to add new columns at the end of the existing ones in a Data Source.
Be aware that currently we don’t validate if the change will affect the existing MVs (Materialized Views) attached to the Data Source to be modified, so this change may break existing MVs. For example, avoid changing a Data Source that has a MV created with something like
SELECT * FROM Data Source .... If you want to have forward compatible MVs with column additions, create them especifying the columns instead of using the*operator.Also, take in account that, for now, the only engines supporting adding new columns are those inside the MergeTree family.
To add a column to a Data Source, call this endpoint with the Data Source name and the new schema definition.
For example, having a Data Source created like this:
Creating a Data Source from a schema¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String, date Date, close Float32"
if you want to add a new column ‘concept String’, you need to call this endpoint with the new schema:
Adding a new column to an existing Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \ -d "schema=symbol String, date Date, close Float32, concept String"
If everything went ok, you will get the operations done in the response:
ADD COLUMN operation resulted from the schema change.¶{ "operations": [ "ADD COLUMN `concept` String" ] }
You can also view the inferred operations without executing them adding
dry=truein the parameters.To modify the description of a Data Source:
Modifying the description a Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \ -d "name=stocks" \ -d "description=My new description"
To save in the “value” column of a Kafka Data Source the JSON as a raw string:
Saving the raw string in the value column of a Kafka Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \ -d "name=stocks" \ -d "kafka_store_raw_value=true" -d "kafka_store_headers=true"
To modify the TTL of a Data Source:
Modifying the TTL of a Data Source¶curl \ -H "Authorization: Bearer $TB_TOKEN" \ -X POST "https://api.us-west-2.aws.tinybird.co/v0/datasources/<DATASOURCE_NAME>/alter" \ --data-urlencode 'ttl=`timestamp` + toIntervalHour(12)'
TTL Syntax Guidelines
Use a time-based column from the datasource schema (e.g. a DateTime column).
Express the TTL in valid ClickHouse syntax using toInterval*() functions.
URL-encode the expression to ensure it’s parsed correctly.
Supported intervals include toIntervalMinute(), toIntervalHour(), toIntervalDay(), toIntervalMonth(), etc.
To remove the TTL of a Data Source:
Modifying the TTL of a Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "ttl=false"
To add default values to the columns of a Data Source:
Modifying default values¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String DEFAULT '-', date Date DEFAULT now(), close Float32 DEFAULT 1.1"
To add default values to the columns of a NDJSON Data Source, add the default definition after the jsonpath definition:
Modifying default values in a NDJSON Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String `json:$.symbol` DEFAULT '-', date Date `json:$.date` DEFAULT now(), close `json:$.close` Float32 DEFAULT 1.1"
To make a column nullable, change the type of the column adding the Nullable type prefix to old one:
Converting column “close” to Nullable¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String `json:$.symbol, date Date `json:$.date`, close `json:$.close` Nullable(Float32)"
To drop a column, just remove the column from the schema definition. It will not be possible removing columns that are part of the primary or partition key:
Remove column “close” from the Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String `json:$.symbol, date Date `json:$.date`"
You can also alter the JSONPaths of existing Data Sources. In that case you have to specify the JSONPath in the schema in the same way as when you created the Data Source.
- POST /v0/datasources/(.+)/truncate¶
Truncates a Data Source in your account. If the Data Source has dependent Materialized Views, those won’t be truncated in cascade. In case you want to delete data from other dependent Materialized Views, you’ll have to do a subsequent call to this method. Auth token in use must have the
DATASOURCES:CREATEscope.Truncating a Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources/name/truncate"
This works as well for the
quarantinetable of a Data Source. Remember that the quarantine table for a Data Source has the same name but with the “_quarantine” suffix.Truncating the quarantine table from a Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:DROP token>" \ -X POST "https://api.tinybird.co/v0/datasources/:name_quarantine/truncate"
- POST /v0/datasources/(.+)/delete¶
Deletes rows from a Data Source in your account given a SQL condition. Auth token in use must have the
DATASOURCES:CREATEscope.Deleting rows from a Data Source given a SQL condition¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ --data "delete_condition=(country='ES')" \ "https://api.tinybird.co/v0/datasources/:name/delete"
When deleting rows from a Data Source, the response will not be the final result of the deletion but a Job. You can check the job status and progress using the Jobs API. In the response,
id,job_id, anddelete_idshould have the same value:Delete API Response¶{ "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_url": "https://api.tinybird.co/v0/jobs/64e5f541-xxxx-xxxx-xxxx-00524051861b", "job": { "kind": "delete_data", "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "status": "waiting", "created_at": "2023-04-11 13:52:32.423207", "updated_at": "2023-04-11 13:52:32.423213", "started_at": null, "is_cancellable": true, "datasource": { "id": "t_c45d5ae6781b41278fcee365f5bxxxxx", "name": "shopping_data" }, "delete_condition": "event = 'search'" }, "status": "waiting", "delete_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b" }
To check on the progress of the delete job, use the
job_idfrom the Delete API response to query the Jobs API.For example, to check on the status of the above delete job:
checking the status of the delete job¶curl \ -H "Authorization: Bearer <TOKEN>" \ https://api.tinybird.co/v0/jobs/64e5f541-xxxx-xxxx-xxxx-00524051861b
Would respond with:
Job API Response¶{ "kind": "delete_data", "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "status": "done", "created_at": "2023-04-11 13:52:32.423207", "updated_at": "2023-04-11 13:52:37.330020", "started_at": "2023-04-11 13:52:32.842861", "is_cancellable": false, "datasource": { "id": "t_c45d5ae6781b41278fcee365f5bc2d35", "name": "shopping_data" }, "delete_condition": " event = 'search'", "rows_affected": 100 }
Data Source engines supported
Tinybird uses ClickHouse as the underlying storage technology. ClickHouse features different strategies to store data, these different strategies define not only where and how the data is stored but what kind of data access, queries, and availability your data has. In ClickHouse terms, a Tinybird Data Source uses a Table Engine that determines those factors.
Currently, Tinybird supports deleting data for data sources with the following Engines:
MergeTree
ReplacingMergeTree
SummingMergeTree
AggregatingMergeTree
CollapsingMergeTree
VersionedCollapsingMergeTree
Dependent views deletion
If the Data Source has dependent Materialized Views, those won’t be cascade deleted. In case you want to delete data from other dependent Materialized Views, you’ll have to do a subsequent call to this method for the affected view with a proper
delete_condition. This applies as well to the associatedquarantineData Source.Request parameters¶ KEY
TYPE
DESCRIPTION
delete_condition
String
Mandatory. A string representing the WHERE SQL clause you’d add to a regular DELETE FROM <table> WHERE <delete_condition> statement. Most of the times you might want to write a simple
delete_conditionsuch ascolumn_name=valuebut any valid SQL statement including conditional operators is validdry_run
String
Default:
false. It allows you to test the deletion. When usingtrueit will execute all deletion validations and return number of matchedrows_to_be_deleted.
- GET /v0/datasources/(.+)¶
- Getting information about a particular Data Source¶
curl \ -H "Authorization: Bearer <DATASOURCES:READ token>" \ -X GET "https://api.tinybird.co/v0/datasources/datasource_name"
Get Data Source information and stats. The token provided must have read access to the Data Source.
Successful response¶{ "id": "t_bd1c62b5e67142bd9bf9a7f113a2b6ea", "name": "datasource_name", "statistics": { "bytes": 430833, "row_count": 3980 }, "used_by": [{ "id": "t_efdc62b5e67142bd9bf9a7f113a34353", "name": "pipe_using_datasource_name" }] "updated_at": "2018-09-07 23:50:32.322461", "created_at": "2018-11-28 23:50:32.322461", "type": "csv" }
Request parameters¶ Key
Type
Description
attrs
String
comma separated list of the Data Source attributes to return in the response. Example:
attrs=name,id,engine. Leave empty to return a full responseidandnameare two ways to refer to the Data Source in SQL queries and API endpoints. The only difference is that theidnever changes; it will work even if you change thename(which is the name used to display the Data Source in the UI). In general you can useidornameindistinctively:Using the above response as an example:
select count(1) from events_tableis equivalent to
select count(1) from t_bd1c62b5e67142bd9bf9a7f113a2b6eaThe id
t_bd1c62b5e67142bd9bf9a7f113a2b6eais not a descriptive name so you can add a description liket_my_events_datasource.bd1c62b5e67142bd9bf9a7f113a2b6eaThe
statisticsproperty contains information about the table. Those numbers are an estimation:bytesis the estimated data size on disk androw_countthe estimated number of rows. These statistics are updated whenever data is appended to the Data Source.The
used_byproperty contains the list of pipes that are using this data source. Only Pipeidandnameare sent.The
typeproperty indicates theformatused when the Data Source was created. Available formats arecsv,ndjson, andparquet. The Data Sourcetypeindicates what file format you can use to ingest data.
- DELETE /v0/datasources/(.+)¶
- Dropping a Data Source¶
curl \ -H "Authorization: Bearer <DATASOURCES:DROP token>" \ -X DELETE "https://api.tinybird.co/v0/datasources/:name"
Drops a Data Source from your account.
Request parameters¶ Key
Type
Description
force
String
Default:
false. Theforceparameter is taken into account when trying to delete Materialized Views. By default, when usingfalsethe deletion will not be carried out; you can enable it by setting it totrue. If the given Data Source is being used as the trigger of a Materialized Node, it will not be deleted in any case.dry_run
String
Default:
false. It allows you to test the deletion. When usingtrueit will execute all deletion validations and return the possible affected materializations and other dependencies of a given Data Source.token
String
Auth token. Only required if no Bearer Authorization header is sent. It must have
DROP:datasource_namescope for the given Data Source.
- PUT /v0/datasources/(.+)¶
Update Data Source attributes
Updating the name of a Data Source¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X PUT "https://api.tinybird.co/v0/datasources/:name?name=new_name"
Promoting a Data Source to a Snowflake one¶curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X PUT "https://api.tinybird.co/v0/datasources/:name" \ -d "connector=1d8232bf-2254-4d68-beff-4dd9aa505ab0" \ -d "service=snowflake" \ -d "cron=*/30 * * * *" \ -d "query=select a, b, c from test" \ -d "mode=replace" \ -d "external_data_source=database.schema.table" \ -d "ingest_now=True" \
Request parameters¶ Key
Type
Description
name
String
new name for the Data Source
token
String
Auth token. Only required if no Bearer Authorization header is sent. It should have
DATASOURCES:CREATEscope for the given Data Sourceconnector
String
Connector ID to link it to
service
String
Type of service to promote it to. Only ‘snowflake’ or ‘bigquery’ allowed
cron
String
Cron-like pattern to execute the connector’s job
query
String
Optional: custom query to collect from the external data source
mode
String
Only replace is allowed for connectors
external_data_source
String
External data source to use for Snowflake
ingest_now
Boolean
To ingest the data immediately instead of waiting for the first execution determined by cron