PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Our Columns:

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter
Back to Blog
Share this article:
Back

When you should use columnar databases and not Postgres, MySQL, or MongoDB

Row-oriented, OLTP databases aren't ideal application DBs when you know you'll need to run analytics on lots of data. Choose a column-oriented OLAP instead.
The Data Base
Javi Santana
Javi SantanaCo-founder

When you develop an application, your first choice for a database is often a relational database like Postgres, MySQL, or NoSQL databases like MongoDB. You can't go wrong with any of these traditional databases for software development; they're great general-purpose relational databases with huge communities and some excellent features (e.g., transactions) that make developers' lives easier. However, when it comes to analytical workloads, columnar databases significantly outperform MongoDB for real-time analytics.

But there comes a point when things start to slow down. So you read a few blog posts, scan some docs, browse some dev forums, and spend hours tuning and improving queries, database config, etc. This does improve things temporarily, but eventually, you hit a wall. What you really need is a columnar database. For a detailed performance comparison of MySQL vs ClickHouse for analytics workloads, see our in-depth guide. To learn how to write efficient ClickHouse queries, check out our comprehensive guide.

If you cut your backend teeth on row-oriented databases like Postgres or MySQL, it'd be understandable if you thought that most applications should be built on these or similar databases; the row-oriented, online transactional processing (OLTP) approach lends itself well to most app development that's happened over the last decade.

But then you realize there are other databases out there focused specifically on real-time analytical use cases with lots of data and complex queries. Newcomers like ClickHouse, Pinot, and Druid (all open source RDBMS) respond to a new class of problem: The need to develop applications over big data with analytical queries that were previously confined only to data warehouses like Snowflake, BigQuery, or Amazon Redshift and the business intelligence tools they often feed. These online analytical processing (OLAP) data stores are optimized for fast writes and fast reads over big data.

To use a metaphor: if you want to find 3 specific trees in a massive forest, row-oriented OLTP is great. But if you want to count all the trees? That’s where column-oriented OLAP comes in.

When you discover column-oriented databases, all of a sudden your general-purpose database doesn’t feel so “general-purpose” anymore. And you realize that maybe row-based databases like Postgres and MySQL or document databases like MongoDB aren’t the databases you’re looking for to tackle your next project.

Tinybird is a real-time, columnar database with a built-in API layer that’s perfect for building applications over analytical queries: You can ingest lots of data into a columnar database, query it with SQL, and instantly publish your analytical queries as APIs with which to build your applications.

these aren't the databases you're looking for meme

Why choose a columnar database?

Why can't every database management system (DBMS) be good for both transactional and analytical workloads? The answer is basic physics. Hardware has hard limits, and you have to configure it one way for transactional use cases, and another way for analytical ones.

More specifically, it has to do with how data is physically stored and processed. Column-oriented databases like ClickHouse and Tinybird, for example, store data in a columnar fashion; all the values in a column store live together on disk. In row-oriented, OLTP databases, the data is stored by row, so all the values in a row stay together.

And this is very important.

Columnar databases store data in columns of data together on disk. Traditional row-oriented OLTP databases store data in rows of data together on disk. This makes a big difference for analytical data processing.

Data locality matters.

If you go to the grocery store to buy 100 cans of Coca-Cola, you hope to find them packaged in 24-packs all on the same shelf. You go to one place, grab the packs you need, check out, and you’re on to your next errand in 5 minutes tops.

But if the cans are spread out all over the store, behind the bananas and beside the corn starch, you'd need to push your shopping cart from end to end to collect them all. You’d be lucky to get out of there before closing time.

Now consider what it might be like if your cans of soda are all stored in different stores! Now we’re getting into the realm of distributed systems, Apache Cassandra, HBase, and cloud data warehousing.

The same thing is true with data: If data sits together in disk or memory, reading and processing it is way faster.

This is because disks and memory work 100 times faster if access is sequential. And CPUs process much faster if they don't need to jump between different tasks. If you are going to access data quickly, you don’t want it distributed over many disks on many different machines. You want it on one disk, stored close together.

Disks, memory, and CPU work way faster when data is close together.

But of course, cans of soda aren’t data and it’s not fair to compare them. Cans of soda are physical objects, data is not. If the goal is to optimize your “checkout time”, then there are many things you can do with data storage to speed things up:

Compress it

We all use compression to save data on our disks. If you are old like me you probably used winzip. In general, we all understand that if you compress data, it gets smaller. Typically speaking, smaller is better when storing anything. It’s why fast food chains buy soda syrup and carbonated water in bulk instead of stacking hundreds of 12-packs next to the burger patties.

Compression is a general term, and there are many ways to compress data. But there’s one thing common to all compression methods: compression works much better if similar values are together.

Vectorize it

Many years ago, video games started to use 3D graphics that required complex vector calculations to move things in a 3D space. Intel (the market leader) created MMX technology and later SSE to enable CPUs to do vector math quickly. So in one CPU cycle, you do 3 operations instead of one. Vectorization lets CPUs process many values at the same time, but it's only possible if those values are stored together.

Cache it

In a somewhat famous slide titled “Numbers Everyone Should Know” (slide 13 in this presentation), Jeff Dean lists retrieval times by data location. As you can see, L1 cache access is 200x faster than main memory access. If you cache, you go fast.

And it turns out that caching really likes when the data is close together. Add this along with compression so you can fit more data into the cache, and you can feed the CPU that much faster.

Sort it

Sorting is another huge performance factor, especially for columnar databases. If the data is sorted by the columns you’ll be filtering on, everything speeds up. It compresses better, access is faster, and the data locality is much much better. Sorting also helps to improve algorithms like joins, order by, limit, and group by. Proper sorting can speed up queries by multiple orders of magnitude.

Parallelize it

One more thing: parallelization. It’s not unique to analytical databases, but it helps a lot when you have a lot of data, which is when you typically consider using an analytical database. There are several types of parallelization: inside the CPU (aka vectorization), across multiple CPUs, and even across multiple machines. I’ll talk about it more in a bit.

Because analytical use cases almost always involve aggregating and filtering on data stored columns, running analytical queries against columnar storage is much faster.

All these factors combined, it’s pretty simple: Things go faster when the data you want to access is stored together. And because analytical use cases pretty much always involve aggregating and filtering on data in columns, running these analytical queries over columns of data is just much faster. The hardware is optimized to count all the “trees” in as few cycles as possible. This is the main reason why column-oriented storage is better for analytics.

Other use cases for columnar databases

The low-level benefits of column-oriented databases for analytics should be pretty clear at this point: Data locality to extract 100% of the hardware is huge for speeding up queries on large amounts of data involving aggregations and filters typical in analytical use cases. But it doesn’t stop there. Analytical databases have other properties that make them even more appealing for handling big data.

Probabilistic data structures

When you run analytics, you often don’t need many things that OLTP databases offer. One of those things is exactness, especially in statistical calculations. That might not seem like a big deal, but if you’re building on top of analytical databases, it has big implications.

If you are allowed to have a small error in your statistics, say +/- 1%, it can mean much, much faster queries. Calculating unique values, for example, is very compute-intensive and requires a lot of memory. If you can get by with some error, you can use probabilistic data structures like HyperLogLog that estimate unique values with less memory and less CPU.

Eventual consistency is also important here; it’s not usually achievable on OLTP workloads, but it’s not a problem in analytics. When you have a lot of data, a single machine often isn’t enough to run your analytical workloads. Of course, you should always try to scale vertically if you can, but eventually, you’ll need to put data on several machines (sharding and replication are the terms for this).

This is a well-trodden path: Coordination in a distributed system is not hard and there are numerous books written about it. So you can pretty easily set up a cluster with several machines to scale your reads and writes. But the benefits go beyond basic horizontal scaling.

Probabilistic data structures improve query performance when exactness isn't required, and they are highly parallelizable.

I talked earlier about parallelization at different levels. It turns out that parallelization, distributed systems, and probabilistic data structures get along quite well.

The same algorithms and methods used to parallelize a workload on many cores work well when you parallelize it on many machines. Probabilistic data structures themselves happen to be highly parallelizable as well.

Analytical databases take advantage of this. Tinybird, for example, supports a number of non-exact SQL functions like uniq() and quantileDeterministic() that deterministically estimate their respective statistics. As data volumes get bigger and bigger, this has a meaningful impact on query latency.

Faster writes with LSM tree

An important component of analytical architectures is the log-structured merge-tree (LSMT), a data structure that many new databases (and some “old”) use because it aligns well with how hardware works.

ClickHouse, for example, uses an LSMT-like structure that lets you insert millions of rows per second without any problems.

In a lot of analytical use cases, you don’t just need your queries to be fast, you also need them to query the freshest data. This is especially important for real-time use cases where you need to serve low-latency analytics on streaming data. Every millisecond counts. Just ask the day traders.

Incremental rollups and materializations

Almost all databases support some form of materialized views. But in most databases, including Postgres, MySQL, and MongoDB, the materialized views need to be periodically and manually refreshed. Analytical databases, on the other hand, usually have special tables and structures to enable incremental materializations, rollups, and other kinds of aggregations. The result is faster queries over aggregations even on datasets with high-frequency inserts.

Specialized functions for statistics and time series

It’s hard to find an analytical use case that doesn’t involve time series data, statistical functions, or both. Most columnar database designers understood this, so they designed their DBMS with specialized functions for time series data and statistics. Tinybird has a host of specialized functions for dealing with dates and times that you mostly don’t get with Postgres, for example.

Columnar databases like ClickHouse or Tinybird come with functions and structures that are optimized for analytical use cases.

Columnar databases aren't perfect, but they are useful.

Column-oriented databases aren't perfect by any means. They're often a huge pain, not necessarily because they are harder to manage, but, because they let you store and process way more data, things just get harder in general.

A meme that says "Mo data, mo problems"

But maybe, as you approach your next project, you should stop thinking in terms of transactions, linearizability, fast point queries, super advanced search indices, and the other trappings of row-oriented databases.

Instead, think about what the types of queries you'll need to run and how much data you'll have. If you need to do sums on billions of rows, you're gonna want to go with a columnar database. For workloads requiring both transactional updates and analytical queries, HTAP systems like SingleStore offer a hybrid approach; see our [ClickHouse vs SingleStore comparison](https://www.tinybird.co/blog-posts/clickhouse-vs-singlestore-olap-vs-htap) for details. Cloud-native alternatives like Databend offer flexible scaling for AI workloads.

Fortunately, Tinybird can make working with a columnar database feel more akin to working with something traditional like Postgres. If you're developing software that needs to handle analytical workloads, you should give Tinybird a try. The Free Plan is free forever with no time restrictions and generous free limits.

Do you like this post? Spread it!

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.
Tinybird wordmark

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline

Related posts

The Data Base
Nov 03, 2023
What are columnar databases? Here are 35 examples.
Cameron Archer
Cameron ArcherTech Writer
1What are columnar databases? Here are 35 examples.
The Data Base
Nov 29, 2022
Understanding the Data Warehouse
Alasdair Brown
Alasdair BrownDeveloper Advocate
1Understanding the Data Warehouse
The Data Base
Jun 09, 2022
What are Materialized Views and why do they matter for real-time?
Cameron Archer
Cameron ArcherTech Writer
1What are Materialized Views and why do they matter for real-time?
The Data Base
Feb 27, 2025
Outgrowing Postgres: How to optimize and integrate an OLTP + OLAP stack
Ariel Pérez
Ariel PérezHead of Product & Technology
1Outgrowing Postgres: How to optimize and integrate an OLTP + OLAP stack
The Data Base
Jul 16, 2024
Best practices for timestamps and time zones in databases
Dan Chaffelson
Dan ChaffelsonSales Engineer
1Best practices for timestamps and time zones in databases
The Data Base
Mar 26, 2024
What is the best database for real-time analytics?
Cameron Archer
Cameron ArcherTech Writer
1What is the best database for real-time analytics?
The Data Base
Oct 14, 2019
The cron job that will speed up your Postgres queries 100x
Jorge Sancha
Jorge SanchaCo-founder
1The cron job that will speed up your Postgres queries 100x
The Data Base
May 26, 2025
Can I use Supabase for analytics?
Cameron Archer
Cameron ArcherTech Writer
1Can I use Supabase for analytics?
The Data Base
Jul 26, 2022
Text search at scale with ClickHouse
Xoel López
Xoel LópezFounder at TheirStack
1Text search at scale with ClickHouse
The Data Base
Jun 21, 2021
Adapt Postgres queries to Tinybird
Xoel López
Xoel LópezFounder at TheirStack
1Adapt Postgres queries to Tinybird