tinybird
/
guides

How to ingest JSON data

Intermediate
In this guide you'll learn how to ingest unstructured data, like JSON to Tinybird.

A typical scenario consists of having a document based database or using nested records on your data warehouse. In those cases, sometimes it is hard to flatten the records so they conform to a typical CSV file structure in which each field represents a column.

For cases like this, is easier to just export your ``JSON`` objects as if they were a ``String`` in a CSV file, ingest them to Tinybird and then use the builtin ``JSON`` functions to prepare the data for real-time analytics as it's being ingested.

Converting JSON files into CSV files

Let's use as an example a ``JSON`` object like this which contains several objects with arrays, objects and arrays of objects in each of them:

To generate a CSV file that you could import to Tinybird, you'd run the following command:

``jq '.data[] | [. | tostring] | @csv' user_events.json  -c  -r > user_events.csv``

This is what this command does:

  • ``js '.data[] -c`` extracts each one of the elements from the array under the ``data`` key, in the original JSON. The ``-c`` flag stands for compact, and it puts all the contents of each record in a single line. If your data is already in  NDJSON or JSON Lines format, this step is not necessary. This is done with jq, a powerful command-line tool to work with JSON data.
  • ``'[. | tostring] | @csv' -r`` escapes double quotes, using 2 double quotes, and outputs data in a CSV format. To be able to ingest it to a Tinybird Data Source, all you have to do is convert it to a ``String``, tacking into account that it needs to be double quoted to avoid the Tinybird CSV parser wrongly detecting delimiters
  • ``> user_events.csv`` saves the output to a file, overwriting its contents if it already exists.

Here you can see the better what each part does

If you already have an NDJSON file, like after the first step in the previous snippet, to get the csv you'd just run ``jq '[. | tostring] | @csv' -r`` to format it in the right way.

After that, if you add ``> output_file.csv`` to the previous command you write the result to a file called ``output_file.csv``. To append data, do ``>>`` instead of ``>``.

If instead of a JSON with an array of records, you have multiple JSON with a single record like this:

the best way to ingest them to Tinybird would be to combine them first into a file with multiple elements, instead of ingesting files with only one event. You could to it with a script like this, that iterates over all the JSON files in a directory and converts them to a single-string-column CSV:

Ingesting it to Tinybird

Now you can just ingest it using the Import API as a regular CSV file:

Instead of duplicating double quotes, you can use a escape character, for instance a backslash (but if you can, use double quotes, as ingestions will be faster this way):

In this case, you have to provide the ``dialect_escapechar`` to the Import API like this:

Now you have a single column Data Source with the JSON String, you can further process.

Extracting values from the JSON String column

For instance to flatten the ``JSON`` you can run a SQL query like this one:

{% tip-box title="builtin JSON functions" %}You can see the available builtin JSON functions by typing ``JSON`` on any pipe from the Tinybird UI with integrated documentation and auto-complete.{% tip-box-end %}

Integrated JSON builtin functions and auto-complete dialog

This and other transformations can be done on-the-fly while data is being ingested using materialized views. Check out this video to learn how ingest JSON data and extract the values from it with a materialized view:

ON THIS GUIDE