Using Pipes to create Materialized Views

When it comes to real-time analytics over huge quantities of data, it's very common the need of having different organized, pre-filtered or pre-aggregated views of your source data for different use cases. This guide shows you how to do so with Tinybird's Pipes.

Transformation pipes are a special type of Pipe that persist the result of the transformation in a materialized view (a Data Source with some special stuff). You can Materialized Views in your API endpoints as regular Data Sources and are a great way to optimize your data analysis for real-time use cases.

In certain cases, Materialized Views are key on helping you achieving ultra-fast response times over billions of rows.

How does it work?

The only difference between a Pipe used to build an API endpoint and a Pipe used to create and feed a Materialized View is the output. Instead of enabling dynamic requests over your data, you materialize the results of the transformations in a Data Source (the Materialized view).

Once you configure a Pipe to materialize its output in a Materialized View, the transformation happens on-the-fly in any of these three scenarios:

  1. You ingest data to your any Data Source used in your Pipe.
  2. You do a full replace on any Data Source used in your Pipe.
  3. You force a populate on any Data Source used in your Pipe.

Materialized Views are always synced and ready to be used, with the data organized for concrete use cases.

A practical example using the CLI Tool

We have an {% code-line %}events{% code-line-end %} Data Source which for each action performed in an e-commerce website, stores a timestamp, the user that performed the action, product, which type of action ({% code-line %}buy{% code-line-end %}, {% code-line %}add to cart{% code-line-end %}, {% code-line %}view{% code-line-end %}, etc.) and a json column containing some metadata, such as the price.

During this guide we will learn how to use the CLI Tool to create Pipes and Materialized Views.

{% tip-box title="INGEST SOME DEMO DATA" %}Follow our Ingesting data guide to learn how to create the events Data Source before starting with this guide{% tip-box-end %}

The {% code-line %}events{% code-line-end %} data source is expected to store billions of rows per month. Its data schema is as follows:

We want to publish an API endpoint calculating the top 10 products in terms of sales for a date range ranked by total amount sold. Here's where Materialized Views can help us.

As you can see, after doing the needed transformations, everything you need is to set the TYPE parameter to {% code-line %}materialized{% code-line-end %} and add the name of the Data Source which will materialize the results on-the-fly.

And then, do the rest in the Data Source schema definition for the Materialized View, which we will name {% code-line %}top_products_view{% code-line-end %}:

As you see, the destination Data Source uses an AggregatingMergeTree engine, which for each {% code-line %}date{% code-line-end %} will store the corresponding {% code-line %}AggregateFunction{% code-line-end %} for the top 10 products and the total sales.

Having the data pre-calculated as it gets ingested, will make the API endpoint to run in real-time, no matter the number of rows in the {% code-line %}events{% code-line-end %} Data Source are.

Regarding the Pipe we will use to build the API endpoint, which we will call {% code-line %}top_products_agg{% code-line-end %}, it will be as simple as:

{% tip-box title="AGGREGATE FUNCTION MODES" %}Note that when pre-aggregating the Aggregate Function uses the mode {% code-line %}State{% code-line-end %}, while when getting the calculation it makes use of {% code-line %}Merge{% code-line-end %}{% tip-box-end %}

Once it's done, let's push everything to our Tinybird account:

Note that when pushing the {% code-line %}top_product_per_day.pipe{% code-line-end %} we use the {% code-line %}--populate{% code-line-end %} flag. It'll make the transformation to be run in a job, and the materialized view {% code-line %}top_products_view{% code-line-end %} to be populated.

Of course, you can re-populate materialized views at any moment:

Common cases when using Materialized Views for real-time analytics

Materialized Views in Tinybird are extremely powerful and very useful for real-time analytics. Here is a list of scenarios where you will potentially benefit a lot from using them:

  • Prepare Data Sources to be joined faster with joinGet and the JOIN engine.
  • Pre-aggregate data with AggregatingMerge or SummingMergeTree.
  • Denormalize data sources by materializing a view joining several Data Sources.
  • Extracting data from JSON columns and making accessible for querying.
  • Aggregate data as arrays with the groupArray function.
  • Feature engineering by calculating additional columns for your analysis, such as geospatial indexes with the geohashEncode function or cityHash64.


Unfortunately you cannot create Materialized Views from the User Interface, but you can see those Data sources in your Tinybird Dashboard and you can see which nodes are the ones materializing data.

Also, Materialized Views work mostly as insert triggers. That means a delete or truncate operation on your source table won't affect the related Materialized Views.

As transformation and ingestion in the Materialized View is done on each block of inserted data in the source table, certain operations such as {% code-line %}GROUP BY{% code-line-end %}, {% code-line %}ORDER BY{% code-line-end %}, {% code-line %}DISTINCT{% code-line-end %} or {% code-line %}LIMIT{% code-line-end %} might need a specific {% code-line %}engine{% code-line-end %} (such as {% code-line %}AggregatingMergeTree{% code-line-end %} or {% code-line %}SummingMergeTree{% code-line-end %}), that can handle data aggregations in that way.

Materialized Views generated using JOIN clauses are tricky. The resulting Data source will be only automatically updated if when a new operation is performed over the Data Source in the FROM.

Finally you cannot create Materialized Views that depend on the UNION of several Data Sources.