tinybird
/
guides

How to ingest JSON data

Intermediate
In this guide you'll learn how to ingest unstructured data, like JSON to Tinybird.

A typical scenario consists of having a document based database or using nested records on your data warehouse. In those cases, sometimes it is hard to flatten the records so they conform to a typical CSV file structure in which each field represents a column.

For cases like this, is easier to just export your ``JSON`` objects as if they were a ``String`` in a CSV file, ingest them to Tinybird and then use the builtin ``JSON`` functions to prepare the data for real-time analytics as it's being ingested.

Converting JSON files into CSV files

Let's use as an example a ``JSON`` object like this which contains several objects with arrays, objects and arrays of objects in each of them:

To generate a CSV file that you could import to Tinybird, you'd run the following command:

``jq '.data[] | [. | tostring] | @csv' user_events.json  -c  -r > user_events.csv``

This is what this command does:

  • ``js '.data[] -c`` extracts each one of the elements from the array under the ``data`` key, in the original JSON. The ``-c`` flag stands for compact, and it puts all the contents of each record in a single line. If your data is already in  NDJSON or JSON Lines format, this step is not necessary. This is done with jq, a powerful command-line tool to work with JSON data.
  • ``'[. | tostring] | @csv' -r`` escapes double quotes, using 2 double quotes, and outputs data in a CSV format. To be able to ingest it to a Tinybird Data Source, all you have to do is convert it to a ``String``, tacking into account that it needs to be double quoted to avoid the Tinybird CSV parser wrongly detecting delimiters
  • ``> user_events.csv`` saves the output to a file, overwriting its contents if it already exists.

Here you can see the better what each part does

If instead of a JSON with an array of records, you have multiple JSON with a single record like this:

the best way to ingest them to Tinybird would be to combine them first into a file with multiple elements, instead of ingesting files with only one event. You could to it with a script like this, that iterates over all the JSON files in a directory and converts them to a single-string-column CSV:

Ingesting it to Tinybird

Now you can just ingest it using the Import API as a regular CSV file:

Instead of duplicating double quotes, you can use a escape character, for instance a backslash (but if you can, use double quotes, as ingestions will be faster this way):

In this case, you have to provide the ``dialect_escapechar`` to the Import API like this:

Now you have a single column Data Source with the JSON String, you can further process.

Extracting values from the JSON String column

For instance to flatten the ``JSON`` you can run a SQL query like this one:

{% tip-box title="builtin JSON functions" %}You can see the available builtin JSON functions by typing ``JSON`` on any pipe from the Tinybird UI with integrated documentation and auto-complete.{% tip-box-end %}

Integrated JSON builtin functions and auto-complete dialog

This and other transformations can be done on-the-fly while data is being ingested using materialized views. Check out this video to learn how ingest JSON data and extract the values from it with a materialized view:

ON THIS GUIDE