How to ingest NDJSON data

In this guide you'll learn how to ingest unstructured data, like NDJSON to Tinybird.

A typical scenario consists of having a document based database, using nested records on your data warehouse or generate events in JSON format from a web application

For cases like this, we used to export the ``JSON`` objects as if they were a ``String`` in a CSV file, ingest them to Tinybird and then use the builtin ``JSON`` functions to prepare the data for real-time analytics as it's being ingested.

But this is not needed anymore, Tinybird accepts JSON imports by default!

{% tip-box title="NDJSON and JSON support" %}Although Tinybird allows to ingest .json and .ndjson files, it only accepts the Newline Delimited JSON as content:  each line must be a valid JSON object and every line has to end with `\n`. The API will return an error in case each line isn't a valid JSON value{% tip-box-end %}

Ingesting it to Tinybird

For explaining this guide, we will make use of this 100k rows NDJSON file, where it contains events from an ecommerce website with different properties.

With the API

Ingesting NDJSON files using the API is similar to the CSV way. There are only two differences we need to manage in the query parameters are:

  • format: it has to be “ndjson”
  • schema: we are used to providing the name and the type for every column, but we need to include another property, called the `jsonpath` (See the JSONPath syntax). Example: “schema=event_name String `json:$.event.name`”

You can guess the ``schema`` by calling first to the Analyze API. It's a very handy way to not having to recall the ``schema`` and ``jsonpath`` syntax, just send a sample of your file and we'll guess what's inside: columns, types, schema, a preview, etc.

Just take the ``schema`` attribute in the response and either use it right away in the next API request to create the Data Source or modify as you wish: column names, types, remove any column, etc.

Now that we've analyzed the file, let's create the Data Source. In the example below you will ingest the 100k rows NDJSON file only taking 3 columns from it: date, event, and product_id. The ``jsonpath`` allows Tinybird to match the Data Source column with the JSON property path.

With the Command-line Interface

There are no changes in the CLI in order to ingest an NDJSON file. Just run the command you are used to with CSV.

Once it is finished, it will automatically generate a .datasource file with all the columns with their proper types and ``jsonpaths``.

Then you can push that .datasource to Tinybird, in order to start using it in your Pipes or appending new data.

With the User Interface

For creating a new Data Source from an NDJSON file, you just need to go to your dashboard and click on the "Add Data Source" button.

The import modal will allow you to select an NDJSON formatted file (with either .ndjson or .json extension) from your computer or provide a URL like the one we are using for this guide.

Import modal

After selecting an NDJSON file or URL, Tinybird will analyze the content and provide a preview of you data. Apart from changing the name or the type of a column, you can discard it from importing or check its related JSON path.

Preview step

{% tip-box title="JSON Tree view" %}You can preview your JSON data with the Tree view available in this step. Just click on the icon on the top right of the table{% tip-box-end %}

Once your data is imported, you will have a Data Source with your JSON data structured in columns, easy to transform and consume in any Pipe.

Data Source created with data

{% tip-box title="INGEST JUST THE COLUMNS YOU NEED" %} After exploration of your data, always remember to create a Data Source that only has the required columns for your analyses.  That will help to make your ingestion, materialization and your realtime data project faster. {% tip-box-end %}

Dealing with new JSON fields

One of the features we have included with the new NDJSON import is to automatically detect if a new JSON property is being added when new data is coming in. 

Using the Data Source we imported in the previous paragraph, we will include a new property to know the origin country of the event, completing the city. Let's append new data with the property included in JSON (example file).

After finishing the import, open the Data Source modal and check that a new blue banner appears warning you about the new properties added in the last ingestion.

Automatically suggesting new columns

Once you accept to view those new columns, the application will offer you to add them, allowing you to change the column type and the name, as it did in the preview step in the import.

Accepting new columns

From now on, once you append new data where the new column is defined and with a value, it will appear in the results of the Data Source, and will be available to be consumed from your Pipes.

New column receiving data