PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Our Columns:

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter
Back to Blog
Share this article:
Back

The 5 rules for writing faster SQL queries

If you're building real-time analytics, you need your SQL to be fast. Fast SQL queries improve performance and reduce cost. Here are the 5 rules to follow to write the fastest SQL queries of your life.
Scalable Analytics Architecture
Aitana Azcona
Aitana AzconaHead of Base Team

When you’re trying to process huge amounts of data in real-time, there are some best practices to follow. These are hard-learned lessons from years of building real-time systems at huge scale. Now, we're sharing them with you.

The 5 Rules of Fast Queries

At Tinybird, we want our users to be able to go fast. Both when building queries, and then when those queries run. So, we've created a cheatsheet of rules you can consult whenever you need to go fast.

We like to call them The 5 Rules of Fast Queries:

  • Rule № 1 ⟶ The best data is the data you don’t write.
  • Rule № 2 ⟶ The second best data is the data you don’t read.
  • Rule № 3 ⟶ Sequential reads are 100x faster.
  • Rule № 4 ⟶ The less data you process (after read), the better.
  • Rule № 5 ⟶ Move complex operations later in the processing pipeline.

Let’s go one by one, analyzing the performance improvement after the implementation of each rule using data stored in Tinybird. We will use the well-known NYC Taxi Trip dataset. You can get a sample here and import it directly into Tinybird by creating a new Data Source from your dashboard.

Rule 1: The best data is the data you don’t write

This rule seems obvious, but it’s not always followed. There is no reason to save data that you don’t need: it will impact the memory needed (and the money!) and the queries will take more time, so it only has disadvantages.

Rule 2: The second best data is the data you don’t read

To avoid reading data that you don’t need, you should apply filters as soon as possible.

Filtering is the fastest operation, so you should always filter first.

Using the NYC taxi data, let’s suppose we want a list of the trips whose distance is greater than 10 miles and that took place between ‘2017-01-31 14:00:00’ and ‘2017-01-31 15:00:00’. Additionally, we want to get those trips ordered by date.

Let’s see the difference in performance when we apply the filters at the end of our query execution versus the beginning.

First, let’s start the first approach by ordering all the data by date:

Once the data is sorted, we filter it:

A screenshot showing a node of SQL that runs over filtered data.
In Tinybird, you can run multiple nodes of SQL and query over prior nodes. This node runs over the filtered SQL from the previous query above.

This first approach takes around 30-60 ms, if you add the processed time of both nodes.

Pay attention to the statistics: The first node scanned 139.26k rows processed 10.31 MB of data. The second node scanned 24.58k rows and 1.82 MB of data. Why would we scan 139.26k rows in the first place if we just really need to scan 24.58k?

It’s important to be aware that these two values directly impact the query execution time and also affect other queries you may be running at the same time. IO bandwidth is also something you need to keep in mind.

Now, let’s see what happens if the filter is applied before the sorting:

As you can see, if the filter is applied before the sorting, the query takes < 10 ms. If you take a look at the size of the data read, it’s 1.82MB, while the number of rows read is 24.58k. Compared to the previous approach, these are much smaller and more efficient.

This significant difference happens because in the first approach, we are sorting all the data available (even the data that we don’t need for our query) while in the second approach, we are sorting just the rows we need.

Filtering is the fastest operation, so always filter first.

Rule 3: Sequential reads are 100x faster

To be able to carry out sequential reads, it’s essential to define indexes correctly. These indexes should be defined based on the queries that we're going to perform. In this example, we'll simulate indexing data by sorting our tables before we query them to see how filtering by non-indexed columns affects performance.

Always index the columns by which you plan to filter. Sequential reads can be up to 100x faster when working with large amounts of data.

For example, if we want to query the data and filter by tpep_pickup_time, let’s compare what happens when the data is sorted by tpep_pickup_time versus when it’s sorted by any other column.

In the first approach, we will sort the data by another column, for instance, passenger_count:

Once we have the data sorted by passenger_count, we filter it by tpep_pickup_time:

A screenshot of a suboptimal SQL query that filters by a column which is not indexed.
This node filters by date over data that hasn't been sorted by date. The results are suboptimal.

This approach takes around 5-10 ms, the number of scanned rows is 26.73k and the size of data is 1.98MB.

For the second approach, we'll sort the data by tpep_pickup_time:

And once it’s sorted by date, we filter it:

A screenshot of an optimized SQL query that filters on indexed data.
This node filters by date over data that has been sorted by date. The query scans much less data and responds much faster.

We can see that if the data is sorted by tpep_pickup_time and the query uses tpep_pickup_time for filtering, it just takes 1-2 ms, scans only 10.35k rows, and processes only 765.53 KB. The first approach, filtering by another column, takes around 5-10 ms, scans 26.73k rows, and processes 1.98 MB.

It’s important to highlight that the more data we have, the greater the difference between both approaches. When dealing with tons of data, sequential reads can be up 100x faster or more.

Therefore, it’s essential to define the indexes taking into account the queries that will be made.

Rule 4: The less data you process (after read), the better

The less data you process, the faster and cheaper your queries will be. So, if you just need two columns, only retrieve two columns in your SELECT.

Let’s suppose that for our use case, we just need three columns: vendorid, tpep_pickup_datetime, and trip_distance.

Don't run query operations over columns you don't need.

Let’s analyze the difference between selecting all the columns versus just the ones we need.

When we select all the columns, the query takes around 140-180 ms and processes 718.55 MB of data:

However, when we select just the columns we need, the query only takes around 35-60 ms and process ~20% of the data:

As we mentioned before, you can check how the size of scanned data is much less, now just 155.36MB. With analytical databases, if you do not need to retrieve a column, those files are not read and it is much more efficient.

Therefore, you should only process the data that you need.

Rule 5: Move complex operations later in the processing pipeline

Complex operations, such as joins or aggregations, should be performed as late as possible in the processing pipeline. This is because in the first steps you should filter all the data, so the number of rows at the end will be less than at the beginning and, therefore, the cost of executing complex operations will be lower.

So first, let’s aggregate the data:

Now, let’s apply the filter:

A screenshot of a suboptimal SQL query that filters aggregated data
This query filters over aggregated data. The results are suboptimal.

If the aggregations are performing before filtering, the query takes around 50-70 ms in total, and it scans 9.71 million rows and processes 77.68 MB.

In general, do SQL operations in this order: 1. Filter - 2. JOIN - 3. Aggregate.

Let’s see what happens if we filter first and then aggregate:

This approach takes only 20-40 ms even though the number of scanned rows and the size of data is the same as in the previous approach.

Therefore, you should perform complex operations as late as possible in the processing pipeline.

Some additional guidance

In addition to these 5 rules, here’s some more general advice for optimal queries in Tinybird:

Avoid Full Scans

The less data you read in your queries, the faster they are. There are different strategies you can follow in Tinybird to avoid reading all the data in a data source (doing a full scan) from your queries:

  • Always filter first
  • Use indices by setting a proper ENGINE_SORTING_KEY in the Data Source.
  • The column names present in the ENGINE_SORTING_KEY should be the ones you will use for filtering in the WHERE clause. You don’t need to sort by all the columns you use for filtering, only the ones to filter first.
  • The order of the columns in the ENGINE_SORTING_KEY is important: from left to right ordered by relevance (the more important ones for filtering) and cardinality (less cardinality goes first).

Avoid Big Joins

When doing a JOIN in Tinybird, the data in the right Data Source is loaded in memory to perform the JOIN. Therefore, it’s recommended to avoid joining big Data Sources by filtering the data in the right Data Source.

JOINs over tables of more than 1M rows might lead to MEMORY_LIMIT errors when used in Materialized Views, affecting ingestion.

A common pattern to improve JOIN performance is the one below:

New to Tinybird?

Tinybird is a real-time data platform for developers and data teams. Ingest data from anywhere, query it with SQL, and publish your (optimized) queries as low-latency APIs in a click. You can sign up for free with no time limit and no credit card required.

Do you like this post? Spread it!

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.
Tinybird wordmark

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline

Related posts

Scalable Analytics Architecture
May 15, 2023
The 8 considerations for designing public data APIs
Jim Moffitt
Jim MoffittDeveloper Advocate
1The 8 considerations for designing public data APIs
Scalable Analytics Architecture
Jun 09, 2023
5 criteria of data quality and how to test for them
Antonio Carlón
Antonio CarlónBackend Developer
15 criteria of data quality and how to test for them
Scalable Analytics Architecture
Aug 18, 2023
Real-Time Data Ingestion: The Foundation for Real-time Analytics
Cameron Archer
Cameron ArcherTech Writer
1Real-Time Data Ingestion: The Foundation for Real-time Analytics
Scalable Analytics Architecture
Mar 23, 2023
Data-Driven Talk: Build fast APIs, faster, over data at scale
Jorge Sancha
Jorge SanchaCo-founder
1Data-Driven Talk: Build fast APIs, faster, over data at scale
Scalable Analytics Architecture
Jun 29, 2023
Event-driven architecture best practices for databases and files
Alasdair Brown
Alasdair BrownDeveloper Advocate
1Event-driven architecture best practices for databases and files
Scalable Analytics Architecture
Oct 06, 2022
The Data Journey: Unlocking data for the right now
Alejandro Martín
Alejandro MartínProduct Manager
1The Data Journey: Unlocking data for the right now
Scalable Analytics Architecture
Aug 08, 2023
Real-time Data Visualization: How to build faster dashboards
Cameron Archer
Cameron ArcherTech Writer
1Real-time Data Visualization: How to build faster dashboards
Scalable Analytics Architecture
Jul 19, 2022
The data rules worth $40,000 a day
Aitana Azcona
Aitana AzconaHead of Base Team
1The data rules worth $40,000 a day
Scalable Analytics Architecture
Jul 13, 2023
16 questions you need to ask about your Real-Time Data Strategy
Tinybird
TinybirdTeam
116 questions you need to ask about your Real-Time Data Strategy
Scalable Analytics Architecture
Jun 15, 2022
The hard parts of building data systems with high concurrency
Javi Santana
Javi SantanaCo-founder
1The hard parts of building data systems with high concurrency