Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

The perfect data ingestion API design... does not exist 🙂.

I used the title to catch your attention, but I do think I’ve designed something close to perfect. Check it out and tell me what you'd change.

Easy to use

You can do that with any programming language in a few lines of code.

A format for the web

It accepts NDJSON and JSON. Maybe I'd add support for Parquet, but I think compressed NDJSON is good enough.

Being web-compatible allows you to connect almost any kind of webhook. Or send it from a JavaScript snippet.

Schema >>> schemaless

When working with a lot of data, schemaless is a waste of money and resources, both on storage and processing. The API transforms the attributes into columns (that are stored with the right type in a columnar database) leading to 10x-100x improvements in both.

You can always save the raw data to process it later but, in general, it’s a bad idea.

ACK

The API sends you an ack when the data is received and safely stored. You can forget about it, you know it will eventually be written to the database.

Failing gracefully

Things fail, and this is the most interesting part. If you fail inserting data, you want to know with 100% certainty. If your app dies while you are pushing data, should you retry?

Start building with Tinybird!

If you've read this far, you might want to use Tinybird as your analytics backend. You can just get started, on the free plan.

The API is idempotent. You can retry within a 5 hour window and if the data was inserted, it’s not inserted again as long as you send the same data batch (it uses a hash of the data to know if it was inserted).

The first layer of the API is so simple, so if something does fail internally, in almost every case at least the data is buffered.

Buffering

Speaking of buffering... the API does buffer data. This is generally good performance hygiene for an ingestion API, but it’s also critical if you have an analytical database (as we do). These databases aren't build to accept streaming inserts; they need to insert data in batches, otherwise it’s too expensive (both on CPU and S3 write operations).

This buffer layer also works as the safety net when things fail. For example, overloading a database is quite easy, this helps you to mitigate that without even noticing.

Scale

You can throw 1000 QPS with one event each or 200 QPS with a 50Mb payload. Even if you have a lot of data, that handles at least 99% of use cases.

Real time

Even with some buffering, it works in real time. It usually takes no more than 4 seconds for the data to be available to query from the database, but even that can be reduced to close to a second.

And in general, it just works.

Try it

What do you think? Is it the perfect data ingestion API? Try it out and let me know.

Subscribe to our newsletter

Get 10 links weekly to the Data and AI articles the Tinybird team is reading.

Loading…

Skip the infra work. Deploy your first ClickHouse
project now.

Product /

Company /

Resources /

Integrations /

Use Cases /

The perfect data ingestion API design

Easy to use

A format for the web

Schema >>> schemaless

ACK

Failing gracefully

Buffering

Scale

Real time

Try it

Skip the infra work. Deploy your first ClickHouse
project now.

Product /

Company /

Resources /

Integrations /

Use Cases /

Skip the infra work. Deploy your first ClickHouse project now.

Product /

Company /

Resources /

Integrations /

Use Cases /

The perfect data ingestion API design

Easy to use

A format for the web

Schema >>> schemaless

ACK

Failing gracefully

Buffering

Scale

Real time

Try it

Skip the infra work. Deploy your first ClickHouse project now.

Product /

Company /

Resources /

Integrations /

Use Cases /

Skip the infra work. Deploy your first ClickHouse
project now.

Skip the infra work. Deploy your first ClickHouse
project now.