tinybird
/
guides

Create Materialized Views with Transformation Pipes

Intermediate
When it comes to real-time analytics over huge quantities of data, it's very common to need to have differently organized, pre-filtered or pre-aggregated views of your source data for different use cases. This guide shows you how to create these views with Tinybird's Pipes.

Transformation pipes are a special type of Pipe that persist the result of the transformation in a Materialized View (a Data Source with some special stuff). You can use Materialized Views in your API endpoints as regular Data Sources. These are a great way to optimize your data analysis for real-time use cases.

In certain cases, Materialized Views are key to helping you achieve ultra-fast response times over billions of rows.

How does it work?

The only difference between a Pipe used to build an API endpoint and a Pipe used to create and feed a Materialized View is the output. Instead of enabling dynamic requests over your data, you materialize the results of the transformations in a Data Source (the Materialized View).

Once you configure a Pipe to materialize its output in a Materialized View, the transformation happens on-the-fly in any of these three scenarios:

  1. You ingest data to your any Data Source used in your Pipe.
  2. You do a full replace on any Data Source used in your Pipe.
  3. You force a populate on any Data Source used in your Pipe.

Materialized Views are always synced and ready to be used, with the data organized for concrete use cases.

A practical example using the CLI Tool

We have an {% code-line %}events{% code-line-end %} Data Source which for each action performed in an e-commerce website, stores a timestamp, the user that performed the action, the product, which type of action ({% code-line %}buy{% code-line-end %}, {% code-line %}add to cart{% code-line-end %}, {% code-line %}view{% code-line-end %}, etc.) and a json column containing some metadata, such as the price.

In this guide we will learn how to use the CLI Tool to create Pipes and Materialized Views.

{% tip-box title="INGEST SOME DEMO DATA" %}Follow our Ingesting data guide to learn how to create the events Data Source before starting with this guide.{% tip-box-end %}

The {% code-line %}events{% code-line-end %} data source is expected to store billions of rows per month. Its data schema is as follows:

We want to publish an API endpoint calculating the top 10 products in terms of sales for a date range ranked by total amount sold. Here's where Materialized Views can help us.

As you can see, after doing the desired transformations, all you need to do is set the TYPE parameter to {% code-line %}materialized{% code-line-end %} and add the name of the Data Source, which will materialize the results on-the-fly.

And then, do the rest in the Data Source schema definition for the Materialized View, which we will name {% code-line %}top_products_view{% code-line-end %}:

As you can see, the destination Data Source uses an AggregatingMergeTree engine, which for each {% code-line %}date{% code-line-end %} will store the corresponding {% code-line %}AggregateFunction{% code-line-end %} for the top 10 products and the total sales.

Having the data pre-calculated as it gets ingested, will make the API endpoint run in real-time, no matter the number of rows in the {% code-line %}events{% code-line-end %} Data Source.

Regarding the Pipe we will use to build the API endpoint, which we will call {% code-line %}top_products_agg{% code-line-end %}, it will be as simple as:

{% tip-box title="AGGREGATE FUNCTION MODES" %}Note that when pre-aggregating the Aggregate Function uses the mode {% code-line %}State{% code-line-end %}, while when getting the calculation it makes use of {% code-line %}Merge{% code-line-end %}.{% tip-box-end %}

Once it's done, let's push everything to our Tinybird account:

Note that when pushing the {% code-line %}top_product_per_day.pipe{% code-line-end %} we use the {% code-line %}--populate{% code-line-end %} flag. It'll cause the transformation to be run in a job, and the Materialized View {% code-line %}top_products_view{% code-line-end %} to be populated.

Of course, you can re-populate Materialized Views at any moment:

Common use cases of Materialized Views for real-time analytics

Materialized Views in Tinybird are extremely powerful and very useful for real-time analytics. Here is a list of scenarios where you will potentially benefit greatly from using them:

  • Prepare Data Sources to be joined faster with joinGet and the JOIN engine.
  • Pre-aggregate data with AggregatingMerge or SummingMergeTree.
  • Denormalize data sources by materializing a view joining several Data Sources.
  • Extract data from JSON columns and make it accessible for querying.
  • Aggregate data as arrays with the groupArray function.
  • Feature engineering by calculating additional columns for your analysis, such as geospatial indexes with the geohashEncode function or cityHash64.

Limitations

Unfortunately, you cannot create Materialized Views from the User Interface, but you can see those Data Sources in your Tinybird Dashboard and you can see which nodes are the ones materializing data.

Also, Materialized Views work mostly as insert triggers. That means a delete or truncate operation on your source table won't affect the related Materialized Views.

As transformation and ingestion in the Materialized View is done on each block of inserted data in the source table, certain operations such as {% code-line %}GROUP BY{% code-line-end %}, {% code-line %}ORDER BY{% code-line-end %}, {% code-line %}DISTINCT{% code-line-end %} or {% code-line %}LIMIT{% code-line-end %} might need a specific {% code-line %}engine{% code-line-end %} (such as {% code-line %}AggregatingMergeTree{% code-line-end %} or {% code-line %}SummingMergeTree{% code-line-end %}), that can handle data aggregations in that way.

Materialized Views generated using JOIN clauses are tricky. The resulting Data source will be only automatically updated if and when a new operation is performed over the Data Source in the FROM.

Finally, you cannot create Materialized Views that depend on the UNION of several Data Sources.

ON THIS GUIDE