Back

Factorial builds real‑time data products with Tinybird

Factorial turned to Tinybird to build 12 new user‑facing analytics features in just a few months.

About the company

Factorial is a modern HR software platform present in over 65 countries, bringing HR solutions to more than 8,000 businesses worldwide by automating mundane HR tasks so People leaders can focus on people, not paperwork.

18.853TB
processed per month
7,675
requests per day

Tinybird allows you to have the same kind of technology that the Facebooks, Googles, and LinkedIns of the world have, but without any of the infrastructure or maintenance. It's truly first class.

Marc González

Director of Data at Factorial

Problem

Factorial's traditional batch pipeline using MySQL, AWS Glue, S3, and Athena couldn't meet two non-negotiable requirements for user-facing features: data freshness and low query latency. The schedule-driven pipelines made data stale (hours or days old), and developers couldn't build features users expected.

Their existing data engineering team was relatively small, and they didn't have time to set up and maintain a complex streaming data architecture.

Why Tinybird

Factorial chose Tinybird to handle real-time infrastructure at scale using their existing SQL skills. Tinybird's native Confluent connector allowed them to ingest CDC streams in real-time, and data could be enriched with historical data and materialized for instant developer access.

Results

  • 1 month to production. Completed POC and launched first production feature in just one month.
  • 12 new features shipped. Launched more than 12 user-facing product features in the following six months.
  • Sub-50ms query latency. Reduced average query time from minutes to under 50 milliseconds.
  • Small team, big impact. Only two data engineers power the entire real-time analytics platform.

1. Enabling new scenarios with real-time data

Using Tinybird, Factorial has improved its data freshness and reduced query latency, leading to significantly faster user feature launches. Factorial's decision enabled them to accelerate their time to market and build great new customer experiences like Job Catalog, Audit Log, and Attendance without sacrificing reliability.

Factorial HR dashboard

2. Real-time analytics and more use cases

Like many companies, Factorial began with a traditional batch pipeline using MySQL, Parquet, AWS Glue, Amazon S3, and Amazon Athena that was easy to set up in its early days, guided by the team's previous experience building scalable data architectures.

This was a simple and effective setup, and at the time, their data team consisted of a single engineer. However, as the product evolved, developers increasingly needed to use this data to build user-facing features, and this architecture did not satisfy two non-negotiable requirements: data freshness and low query latency.

Although the lake house architecture proved easy to implement and served internal reporting use cases perfectly, the schedule-driven pipelines to load the data made it difficult for developers to work with. When interacting with user-facing analytical features, users demand up-to-date data. In many cases, the data available in the lake house was stale, in some cases days, or at best, hours, old. This dramatically reduced the value of the data to an end user and thus made it unattractive for developers to use the data to power new product features.

The developers made their requirements clear: they needed fresh data.

And they needed access to their data with low latency.

That's what brought them to Tinybird.

Our existing data engineering team is relatively small given the size of our company. We don't have the time or manpower to worry about setting up and maintaining a complex streaming data mesh architecture. Tinybird handles our real‑time infrastructure at scale and allows us to build new real‑time applications using our existing skills.

Marc González

Director of Data at Factorial

3. Unifying batch and streaming data using Tinybird

To solve for data freshness, the team decided to switch to capturing changes in real-time from MySQL rather than a batch process like running a snapshot on a schedule. They used the MySQL CDC Source (Debezium) Connector for Confluent Cloud to implement Change Data Capture (CDC) over their production MySQL. The MySQL CDC Source (Debezium) Connector captures changes from a database and writes the changes to Apache Kafka®.

Additionally, Factorial's data team saw that Kafka could serve as a reliable buffer for data; if there were any failures downstream, data could buffer in Kafka and be retried. They assessed alternative tools, such as Amazon Kinesis and Google Pub/Sub. Still, they discovered that the offset semantics in Kafka proved more flexible, allowing them to more easily resume data consumers from a previous message in the stream.

Ultimately, the Factorial team chose to use Confluent Cloud, a fully managed Kafka implementation built and operated by the original creators of Kafka.

By sending the MySQL CDC stream to Kafka, Factorial eliminated the batch process that introduced significant latency to their data. To complement their streaming pipeline, they also needed a system that would allow their developers to combine their fresh Kafka streams with historical data to power user-facing applications and which could handle analytical queries in the order of milliseconds.

To make real-time analytical queries over data streams available to developers, Factorial chose Tinybird. Tinybird took away all of the operational overhead of managing real-time analytics data infrastructure at scale in production while fulfilling the strict requirements for latency and freshness. Tinybird allows Factorial to ingest from Confluent Cloud in real-time, using Tinybird's native Confluent connector.

Data streams from Confluent arrive in Tinybird and are enriched with historical data that lives in the platform, avoiding the limitations of stateful stream processors. This means that nearly all data processing can be performed at the time of ingestion, with the result being materialized and ready for developers to push into production features. With only two data engineers, Factorial reduced the average query time for production queries from minutes to sub-50 milliseconds. Tinybird APIs are integrated directly into Factorial's user-facing product, greatly simplifying their application architecture and saving on further infrastructure and tooling costs.

Factorial data architecture with Tinybird

4. Speed wins: Tinybird fuels time-to-market, a key differentiator for Factorial

Factorial completed a POC and launched their first production feature in one month, and over the next six months launched more than 12 user-facing product features that are powered by this real-time pipeline, with many more to come.

When we switched to Tinybird, we ran a PoC and shipped our first feature to production in a month. Since then, we've shipped 12 new user-facing features in just a few months. There's no way we could have done this without Tinybird.

Marc González

Director of Data at Factorial

Share this story!

Ship faster with Tinybird

The data infrastructure and tooling to ship enterprise grade analytical features faster and at a fraction of the cost

Try it for free