PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Our Columns:

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter
Back to Blog
Share this article:
Back

The data rules worth $40,000 a day

A "real-world" example of how data best practices can seriously shrink your usage bill.
Scalable Analytics Architecture
Aitana Azcona
Aitana AzconaHead of Base Team

What if I told you I could save you $40,000 a day? Would you think I’m crazy?

$40,000 may be an astronomical number, but with what some companies are spending on cloud data warehouse egress costs, the order of magnitude perhaps isn’t so far out of the realm of reality.

Regardless of the dollar amount you assign to the problem, the point is this: if you don’t take care to craft your data pipelines, you’re going to overspend. $40K a day is extreme, but bad data management has a cost. A chorus of data engineers and data practitioners including people like Chad Sanderson and Lauren Balik remind us of this almost daily.

How you build analytical queries - and specifically how you manage the amount of data that they process to get the answers you want - can have a huge impact on your cloud bill. It’s critically important that you have robust observability measures to alert you to requests consuming excess data - and the “tax” you’re paying to run them.

At Tinybird we always say to work backwards. Build a functional endpoint, then work hard and fast to optimize it so it processes as little data as possible when it’s queried in production.

So here is a real - albeit extreme - example of putting that to work, and how it impacts costs.

If video is more your thing, you can watch my talk on this subject right below:

The data

Consider an example data source: An audit log containing 258M rows and nearly 14 GB of product usage data sorted and indexed by a company_id. This is a very realistic data set: If my product had 10,000 daily active users generating 100 events per day, I’d easily hit that size in less than a year.

I want to know how many events of a particular type occurred during a particular time period, grouped by country. Seems pretty simple, right?

Here’s a diagram of how I might go about that:

A data transformation flow diagram
The initial transformation process is inefficent.

I start with the events data, join it to a table containing country data on an author_id, aggregate, then filter to the timestamps and event types I want. Easy peasy, and here’s how that looks in a Tinybird Pipe (a chain of nodes containing iterative SQL queries):

But hold on a minute. If you took the time to read through those queries, or if you’re a savvy data engineer, you probably already spotted the problems. This approach is really inefficient. In fact, this query scans 6.67 GB every time it’s called and takes 2.41 s to complete. Yikes.

The results of a Tinybird Pipe showing an inefficient query.
The initial query is very expensive in both scan size and response time.

If I built an application to visualize this result and refreshed it every second, I would spend $40,340.16/day on Tinybird’s Pro-plan. 😳

Side Note
It's worth pointing out here that nobody is spending $40K a day at Tinybird, or even close to that. Tinybird has observability tools to prevent this from happening, and we're always helping clients reduce their cost through more nuanced optimizations. This is an admittedly extreme case designed to prove a point.

How do we fix it?

Whether you’re a data engineer or not, there are some fundamental rules you can follow to not only spend significantly less but also improve the performance of your queries. We’ve described them in detail in one of our guides, and I’ll list them here:

  • Rule № 1 ⟶ The best data is the data you don’t write.
  • Rule № 2 ⟶ The second best data is the data you don’t read.
  • Rule № 3 ⟶ Sequential reads are 100x faster.
  • Rule № 4 ⟶ The less data you process after reading, the better.
  • Rule № 5 ⟶ Do complex operations later in the processing pipeline.

My original query breaks all five of these rules. Whoops. So here’s how I fixed it:

Filter first, join later

The first and simplest fix is to apply filters first and do the join later. By doing so, I speed up my query by following rule 1. Instead of writing a massive join to memory, I’m filtering first, so that subsequent results are smaller.

An updated data transformation flow diagram showing filter first, join later
I changed the data flow to filter first, then join.
The result of a Tinybird Pipe showing reduced query response time
The second query improves response time, but still processes the same amount of data.

While that speed boost should deliver a better frontend UX, I still have a problem: The amount of data I read to complete the query hasn’t changed. I’m still spending roughly the equivalent of a 2021 Tesla Model Y every day.

What to do? Let’s take a look at the schema.

Indices and data types

If you peek at the schema, you’ll see a few problems.

First, I’m using expensive String data types where a LowCardinality(String) is more appropriate, because columns like event, device_OS, and device_browser have only a limited number of options. We can grab back some memory just by assigning appropriate data types.

Second, my sorting keys are incomplete based on the filters I’m trying to apply.

Because I’m filtering on event and datetime columns, I should add them to the sorting keys.

By making these changes to the schema of my data source, I’m following rules 2 and 3: Reading less data - and sequential data - to filter, since now my filters are applied to indexed columns.

A data transformation flow diagram showing an updated schema for performance improvements
Replacing the original data source schema with the new schema

Just making these simple changes has massive implications on cost, because now instead of reading every row in the table, the query can target the granules in the table that fit within my event and datetime filters.

The resulting query now reads only 76 MB in less than 400 ms, I’ve cut costs by 100X to just $400 a day, and I’ve given the frontend UX a nice boost with a much lower response latency.

Materialized views: the ace up our sleeve

So I’ve just saved $39,600 and it’d be easy to call it a day. But there’s one more thing I can do to squeeze as much as I can out of this query.

In Tinybird, materialized views can easily be created using Pipes. Instead of terminating a Pipe in an API endpoint, you can materialize the results, and reap the benefits later. Materialized views update on ingestion in Tinybird, so they’re always up to date.

The best part? There’s no processing cost to incrementally update the materialized views, and the storage cost is negligible (in this case less than one-thousandth of a cent).

To improve query performance even more, I’ll materialize the join and aggregation first. Instead of calling the result below through an API endpoint, I materialize it into a new data source called join_data_mv.

Now, I can query that materialized view instead of the original data source:

The final result scans only 333 kB in about 1 ms.

The result of a Tinybird Pipe showing a massive reduction in query scan size and response time
The final result massively reduces cost and time.

I got the same result as my first query, but with a blazing fast frontend and a bill of just $18/day.

Obviously, compared to the reduction that I got in our third optimization, from $40K to $400, this might seem like only an incremental gain. But this isn’t just about what I saved on this query. It’s about what I’ll save on this query and every query I build after it. By implementing simple best practices, and using materialized views, I can not only justify my value as a data practitioner but also have a real shot at generating a meaningful ROI from this analysis.

Do this enough, and maybe your company will just go ahead and buy you a Tesla to say thanks.

Everyone gets a Tesla!

‍

Do you like this post? Spread it!

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.
Tinybird wordmark

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline

Related posts

Scalable Analytics Architecture
Jul 29, 2022
The definition of real-time data
Alasdair Brown
Alasdair BrownDeveloper Advocate
1The definition of real-time data
Scalable Analytics Architecture
Oct 06, 2022
The Data Journey: Unlocking data for the right now
Alejandro Martín
Alejandro MartínProduct Manager
1The Data Journey: Unlocking data for the right now
Scalable Analytics Architecture
Jun 15, 2022
The hard parts of building data systems with high concurrency
Javi Santana
Javi SantanaCo-founder
1The hard parts of building data systems with high concurrency
Scalable Analytics Architecture
Aug 05, 2022
What is a data product?
Alasdair Brown
Alasdair BrownDeveloper Advocate
1What is a data product?
Scalable Analytics Architecture
Jul 08, 2022
Usage-based pricing: Should you build or buy?
Cameron Archer
Cameron ArcherTech Writer
1Usage-based pricing: Should you build or buy?
Scalable Analytics Architecture
May 08, 2023
The 5 rules for writing faster SQL queries
Aitana Azcona
Aitana AzconaHead of Base Team
1The 5 rules for writing faster SQL queries
Scalable Analytics Architecture
May 15, 2023
The 8 considerations for designing public data APIs
Jim Moffitt
Jim MoffittDeveloper Advocate
1The 8 considerations for designing public data APIs
Scalable Analytics Architecture
Jun 17, 2022
Thinking in Tinybird
Gonzalo Gómez
Gonzalo GómezSoftware Engineer
1Thinking in Tinybird
Scalable Analytics Architecture
Aug 18, 2023
Real-Time Data Ingestion: The Foundation for Real-time Analytics
Cameron Archer
Cameron ArcherTech Writer
1Real-Time Data Ingestion: The Foundation for Real-time Analytics
Scalable Analytics Architecture
Aug 23, 2022
A privacy-first approach to building a Google Analytics alternative
Jorge Sancha
Jorge SanchaCo-founder
1A privacy-first approach to building a Google Analytics alternative