Data sources¶
When you send data to Tinybird, it's stored in a data source. You then write SQL queries to publish API endpoints that will serve that data.
For example, if your event data lives in a Kafka topic, you can create a data source that connects directly to Kafka and writes the events to Tinybird. Similarly, you can send events or data from a file.
There are also intermediate data sources that are the result of materialization or a copy pipe.
Data sources are defined in .datasource
files.
sample.datasource
SCHEMA > `timestamp` DateTime `json:$.timestamp`, `session_id` String `json:$.session_id`, `action` LowCardinality(String) `json:$.action`, `version` LowCardinality(String) `json:$.version`, `payload` String `json:$.payload`
See all syntax options in the Reference.
Create Data Sources¶
To create a new data source, you can manually define the .datasource file or run tb datasource create
command:
tb datasource create
Once you run the command, you’ll be asked which type of data source you want to create.
- Blank. Generates a .datasource file with a couple of example columns you can edit.
- Local file. Creates a .datasource based on the schema of a file you have locally.
- Remote URL. Creates a .datasource based on the schema of a file from a remote URL.
- Kafka. Creates a datasource designed to work with a Kafka connection. If you don’t have one yet, you’ll need to create it first, since the schema is built from the topic you select.
- Amazon S3. For working with an S3 connection. You’ll need an existing connection, as the schema is built from the file in the bucket you choose.
- GCS. Same idea as S3, but for Google Cloud Storage. Make sure you have a connection set up first.
You can run tb datasource create -h
anytime to see this list in the command help.
Delete Data Sources¶
To delete a data source in Tinybird, you need to remove its corresponding .datasource file and deploy your changes using the --allow-destructive-operations
flag to confirm the removal:
tb deploy --allow-destructive-operations
This operation will permanently remove the data source and all its data from your Tinybird workspace. Make sure to review dependencies such as pipes or materialized views that might rely on the data source before deleting it.
Share a Data Source¶
Workspace administrators can share a Data Source with another Workspace they've access to on the same organization.
To share a Data Source, in the .datasource file you want to share, add the destination workspace(s). For example:
origin_datasource.datasource
# ... data source definition ... SHARED_WITH > <destination_workspace>, <other_destination_workspace>
And then deploy your changes.
You can use the shared Data Source to create Pipes in the target Workspace. Users that have access to a shared Data Source can access the tinybird.datasources_ops_log
and the tinybird.kafka_ops_log
Service Data Sources.
Limitations¶
The following limitations apply to shared Data Sources:
- Shared Data Sources are read-only.
- You can't share a shared Data Source, only the original.
- You can't check the quarantine of a shared Data Source.
- You can't create a Materialized View from a shared Data Source
Working locally with Shared Data Sources¶
When a workspace shares a Data Source with another workspace, you need to pay attention to the order in which you deploy them. Say you have Workspace A sharing a Data Source with Workspace B, and Workspace B uses that Data Source in an endpoint. If you start with a fresh tinybird local without any of the workspaces, you will have to:
tb deploy
workspace B. This will fail because it's using a shared Data Source that's not accessible, but the workspace will be created empty.tb deploy
workspace A.tb deploy
workspace B. Now, the shared Data Source is available and the deployment will succeed.
tb build
hides all this complexity and creates the necessary workspaces and Data Sources to verify that a workspace is valid.
Keeping .datasource files up-to-date¶
Run tb [--cloud] pull --only-vendored
to update the .datasource
files of Data Sources shared with your workspace. They will be placed in vendor/<name_of_the_source_workspace/datasources
.
You can only deploy your project if the files in vendor/
are up-to-date. If they aren't, your deployment will fail and you'll be prompted to run the aforementioned command.
Quarantine Data Sources¶
Every Data Source you create in your Workspace has a quarantine Data Source associated that stores data that doesn't fit the schema. If you send rows that don't fit the Data Source schema, they're automatically sent to the quarantine table so that the ingest process doesn't fail.
See the Quarantine page for more details.