tinybird
/
guides

Replacing and deleting data in your Tinybird Data Sources

Intermediate
Data deletion operations are pretty common in transactional databases where your operational data lives, but often due to a data quality process in your operational database you will also need to update or delete your analytical data in Tinybird.

Whether some of the applications ingesting your operational data were buggy, a transient error operating the production database or a change in some regulation, you might need the capability to delete unneeded data, or replace outdated one, influencing your analysis in Tinybird.

While real-time analytical databases are optimized for SELECTs and INSERTs we keep fully supporting other operations needed in data quality management processes. We do that by hiding the complexity of data replication, partitions management or mutations rewriting, so you just have to worry about your data engineering flows and not the internals of real-time analytical databases. This guide will show you how to selectively delete or update data in Tinybird using the Rest API.

Delete data selectively

Data conditional deletion works by firing a POST request to the delete API endpoint providing the name of one of your Data Sources in Tinybird and a {% code-line %}delete_condition{% code-line-end %} parameter, which is an SQL expression filter. This operation is not supported neither in the User Interface nor in the CLI (yet).

Imagine you have a Data Source called {% code-line %}events{% code-line-end %} (you can always follow the ingesting data guide to know how to create one), and that you want to remove all the transactions for November 2019. You’d send a POST request like this:

{% tip-box title="Use a token with the right scope" %}Replace {% code-line %}<your_token>{% code-line-end %} by a token whose scope is {% code-line %}DATASOURCES:CREATE{% code-line-end %} or {% code-line %}ADMIN{% code-line-end %}{% tip-box-end %}

Once you do the request, you will see that the {% code-line %}POST{% code-line-end %} request to the delete API endpoint is asynchronous. It returns a Job response, indicating an ID for the job, the status of the job, the {% code-line %}delete_condition{% code-line-end %} and some other metadata. Although the delete operation runs asynchronously (hence the job response), the operation waits synchronously for all the mutations to be re-written and data replicas to be deleted.

You can poll periodically the {% code-line %}job_url{% code-line-end %} with the given ID to check the status of the deletion process. When it’s {% code-line %}done{% code-line-end %} it means the data matching the SQL expression filter has been removed and all your pipes and API endpoints will continue running with the remaining data in the datasource.

Truncate a Data Source

Sometimes you just want to delete all data contained in a Data Source. Most of the times from starting from zero. You can do so from within the User Interface – from the Data Source modal window options dropdown – and the APIs. Using the API, the truncate endpoint will delete all rows in a Data Source and can be done as follows:

{% tip-box title="Use a token with the right scope" %}Replace {% code-line %}<your_token>{% code-line-end %} by a token whose scope is {% code-line %}DATASOURCES:CREATE{% code-line-end %} or {% code-line %}ADMIN{% code-line-end %}{% tip-box-end %}

If you aren't in the mood for coding, you can also Truncate a Data Source directly from the UI:

Deleting selectively is only available via API, but truncating it to delete all of its data can be done via the UI.

Replace data selectively

The ability to update data is often not the top priority when designing analytical databases, but there are always scenarios where you need to update or replace your analytical data: you might have reconciliation processes over your transactions that affect your original data. Or maybe simply your ingestion process was faulty and ingested inaccurate data for a period of time.

In Tinybird it's possible to specify a condition under which only a part of the data is replaced during the ingestion process. For instance, let’s say you want to re-ingest a CSV with the data for November, 2019 and update your Data Source accordingly. In order to update the data, you just need to pass the {% code-line %}replace_condition{% code-line-end %} parameter with the {% code-line %}toDate(date) >= '2019-11-01' and toDate(date) <= '2019-11-30'{% code-line-end %} condition.

{% tip-box title="Use a token with the right scope" %}Replace {% code-line %}<your_token>{% code-line-end %} by a token whose scope is {% code-line %}DATASOURCES:CREATE{% code-line-end %} or {% code-line %}ADMIN{% code-line-end %}{% tip-box-end %}

The response you'd get after making the previous API call looks like this:

As in the case of the selective deletion, selective replacement runs also as an asynchronous request, so we recommend you check the status of the job periodically. You can easily see the status of the job going to the {% code-line %}job_url{% code-line-end %} returned in the previous response

{% tip-box title="ENSURE YOU HAVE THE RIGHT PERMISSIONS" %}Please note that the provided Auth token must have the DATASOURCES:CREATE scope.{% tip-box-end %}

Replace a Data Source completely

To replace a complete Data Source, you'd make an API call similar to the previous one, without providing a {% code-line %}replace_condition{% code-line-end %}:

{% tip-box title="Use a token with the right scope" %}Replace {% code-line %}<your_token>{% code-line-end %} by a token whose scope is {% code-line %}DATASOURCES:CREATE{% code-line-end %} or {% code-line %}ADMIN{% code-line-end %}{% tip-box-end %}

The request above is replacing a Data Source with the data found in a given URL pointing to a CSV file but you can also do it through our User Interface as follows:

Replacing a Data Source completely can also be done through the User Interface

{% tip-box title="SCHEMAS MUST BE THE SAME" %}When replacing data (either selectively or a whole Data Source) you need to ensure your Data Schema is the same as in the original Data Source. Rows not containing the same schema will go to quarantine{% tip-box-end %}

ON THIS GUIDE