Chapter 1: Idea to Production¶

What you'll build¶

In this chapter, you'll build the first version of your data project locally and deploy it to Tinybird Cloud.

The desired features are:

total-revenue: for the merchant to see their total revenue in realtime.
products-usually-bought-together: to increase ticket size based on other buyers' behavior.

Data structure¶

You'll work with two event types:

orders: Overall order information
order_items: Items within each order

Example data¶

Orders:

{"storeId": 1, "orderId": "order123", "userId": "user456", "timestamp": "2023-04-24T10:30:00Z", "totalAmount": 125.50}
{"storeId": 1, "orderId": "order124", "userId": "user789", "timestamp": "2023-04-24T11:45:00Z", "totalAmount": 75.25}

Order Items:

{"storeId": 1, "orderId": "order123", "items": [{"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}, {"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}]}
{"storeId": 1, "orderId": "order124", "items": [{"productId": "prod3", "quantity": 1, "priceAtTime": 50.25}, {"productId": "prod2", "quantity": 1, "priceAtTime": 25.00}]}

Set up the project¶

First, create a new directory for your Tinybird project. tb create will create the scaffolding for you.

mkdir ecommerce-analytics && cd ecommerce-analytics

tb create

# » Creating new project structure...
# ✓ /datasources
# ✓ /endpoints
# ✓ /materializations
# ✓ /copies
# ✓ /pipes
# ✓ /fixtures
# ✓ /tests
# ✓ /connections
# ✓ Scaffolding completed!

# » Creating CI/CD files for GitHub and GitLab...
# ✓ /.gitignore
# ✓ .github/workflows/tinybird-ci.yml
# ✓ .github/workflows/tinybird-cd.yml
# ✓ ./.gitlab-ci.yml
# ✓ .gitlab/tinybird/tinybird-ci.yml
# ✓ .gitlab/tinybird/tinybird-cd.yml
# ✓ Done!

# » Creating .cursorrules...
# ✓ Done!

You just created the folder structure (/datasources, /endpoints, /fixtures…) where the code will be stored. Also, note the CI/CD files; You're prototyping now, so don't pay too much attention to them. You'll need them when you go to production.

Inserting and storing data¶

Next, define the tables and how the events are going to be stored in them.

Create a tmp-seed-data directory, and add samples for orders and order-items events.

mkdir tmp-seed-data

touch tmp-seed-data/orders.ndjson

echo '{"storeId": 1, "orderId": "order123", "userId": "user456", "timestamp": "2023-04-24T10:30:00Z", "totalAmount": 125.50}
{"storeId": 1, "orderId": "order124", "userId": "user789", "timestamp": "2023-04-24T11:45:00Z", "totalAmount": 75.25}' > tmp-seed-data/orders.ndjson

touch tmp-seed-data/order-items.ndjson

echo '{"storeId": 1, "orderId": "order123", "items": [{"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}, {"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}]}
{"storeId": 1, "orderId": "order124", "items": [{"productId": "prod3", "quantity": 1, "priceAtTime": 50.25}, {"productId": "prod2", "quantity": 1, "priceAtTime": 25.00}]}' > tmp-seed-data/order-items.ndjson

Create two .datasource files. Data sources are the definition of the database tables where you will store the order events. They specify how to access the data and how to store it.

You can proceed directly if you know the syntax, or use tb create with the --data flag to pass the path to sample data.

tb create --data tmp-seed-data/orders.ndjson 

# » Creating resources...
# ✓ /datasources/orders.datasource

# » Generating fixtures...
# ✓ /fixtures/orders

tb create --data tmp-seed-data/order-items.ndjson

# » Creating resources...
# ✓ /datasources/order_items.datasource
# ✓ Done!

# » Generating fixtures...
# ✓ /fixtures/order_items

If you don't sample data, you can use --prompt to pass an LLM prompt to generate a .datasource file and a fixture.

Now that the CLI created /fixtures, you're good to delete tmp-seed-data.

Examining the .datasource files, they should look like this:

datasources/orders.datasource

SCHEMA >
	orderId String `json:$.orderId`,
	storeId UInt32 `json:$.storeId`,
	timestamp String `json:$.timestamp`,
	totalAmount Float32 `json:$.totalAmount`,
	userId String `json:$.userId`

ENGINE "MergeTree"
ENGINE_SORTING_KEY "userId, orderId"

SCHEMA defines the column names, types, and jsonpath to extract the fields where column values are stored in the json.

ENGINE is set to MergeTree, the default table engine that is append only

And ENGINE_SORTING_KEY defines the order in which to physically store the data.

datasources/order_items.datasource

SCHEMA >
	orderId String `json:$.orderId`,
	storeId UInt32 `json:$.storeId`,
	items__productId Array(String) `json:$.items[:].productId`,
	items__priceAtTime Array(Float32) `json:$.items[:].priceAtTime`,
	items__quantity Array(Int16) `json:$.items[:].quantity`

ENGINE "MergeTree"
ENGINE_SORTING_KEY "orderId, storeId"

Similar to the previous one, but note the Array types and jsonpaths that will convert this JSON:

{"storeId": 1, "orderId": "order123", "items": [{"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}, {"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}]}

Into this row:

| storeId | orderId  | items__productId | items__priceAtTime | items__quantity |
|---------|----------|------------------|--------------------|-----------------|
|      1  | order123 | [prod1, prod2]   | [45.00, 35.00]     | [2, 1]          |

Validate the project locally¶

Run the docker image from CLI with tb local start. Orbstack, Docker desktop, or the runtime of your preference is needed.

tb local start

# » Starting Tinybird Local...
# ✓ Tinybird Local is ready!

Build the project to validate your datafiles. If something is wrong (like the syntax), you'll receive an error and can correct it before deploying to the cloud.

tb build

# » Building project...
# ✓ datasources/orders.datasource created
# ✓ datasources/order_items.datasource created

# ✓ Build completed in 0.3s

The build step also ingests your fixtures so you can test locally with data. Verify it is OK, too:

tb sql "select * from orders"

# Running against Tinybird Local
#  storeId  orderId   timestamp             totalAmount  userId
#  UInt32   String    String                Float32      String
#  ───────────────────────────────────────────────────────────────
#       1    order123  2023-04-24T10:30:00Z  125.5        user456
# ───────────────────────────────────────────────────────────────
#       1    order124  2023-04-24T11:45:00Z  75.25        user789

Create the "total_revenue" endpoint¶

Now, create your first endpoint to calculate total revenue metrics. Endpoints are .pipe files, a convenient way to chain SQL queries together like a notebook. With the TYPE command, you can state the desired behavior. You need to use endpoint for exposing an API endpoint that you can call from other services.

endpoints/total_revenue.pipe

NODE total_revenue_endpoint
SQL >
    SELECT 
        count() AS orders,
        sum(totalAmount) AS revenue,
        revenue / orders AS average_order_value
    FROM orders

TYPE endpoint

Test the API endpoint¶

Verify the API endpoint works as expected. You need the url (by default, tb local runs on port 7181) and the admin user token. The path is v0/pipes/<endpoint name>.<format>?token=<token> and you need a token to authenticate. The token can be passed as a query parameter or in the header.

tb token copy "admin local_testing@tinybird.co" && TB_LOCAL_TOKEN=$(pbpaste)

curl -X GET "http://localhost:7181/v0/pipes/total_revenue.json?token=$TB_LOCAL_TOKEN"
# {
#     "meta":
#     [
#             {
#                    	"name": "orders",
#                    	"type": "UInt64"
#            	},
#             {
#                    	"name": "revenue",
#                    	"type": "Float64"
#            	},
#             {
#                    	"name": "average_order_value",
#                    	"type": "Float64"
#            	}
#    	],
#     "data":
#     [
#             {
#                    	"orders": 2,
#                    	"revenue": 200.75,
#                    	"average_order_value": 100.375
#            	}
#    	],
#     "rows": 1,
#     "statistics":
#     {
#            	"elapsed": 0.005939524,
#            	"rows_read": 2,
#            	"bytes_read": 8
#     	}
# }

Check meta and statistics, and more importantly, data. Data is in data array: 2 orders, numbers are OK, you're good to go.

Create the "products_usually_bought_together" endpoint¶

Next, let's create your product recommendation endpoint:

The idea is to end up with a table that contains the product you will be looking at, and a ranking of the products that are bought with it.

So, for these items:

["prod1", "prod2"]
["prod3", "prod2"]
["prod2", "prod6", "prod1"]
["prod2", "prod1"]
["prod7"]

We want a table like this:

productId, boughtWith, count
----------------------------
prod1, prod2, 3
prod1, prod6, 1
prod2, prod1, 3
prod2, prod3, 1
prod2, prod6, 1
prod3, prod2, 1
prod6, prod1, 1
prod6, prod2, 1

There isn't enough sample data in order-items.ndjson, so let's remove the data present in 1 and add new rows:

sample-order-items.ndjson

{"storeId": 1, "orderId": "order123", "items": [{"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}, {"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}]}
{"storeId": 1, "orderId": "order124", "items": [{"productId": "prod3", "quantity": 1, "priceAtTime": 50.25}, {"productId": "prod2", "quantity": 1, "priceAtTime": 25.00}]}
{"storeId": 1, "orderId": "order126", "items": [{"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}, {"productId": "prod6", "quantity": 2, "priceAtTime": 75.00}, {"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}]}
{"storeId": 1, "orderId": "order123", "items": [{"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}, {"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}]}
{"storeId": 1, "orderId": "order124", "items": [{"productId": "prod3", "quantity": 1, "priceAtTime": 50.25}, {"productId": "prod2", "quantity": 1, "priceAtTime": 25.00}]}
{"storeId": 1, "orderId": "order126", "items": [{"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}, {"productId": "prod6", "quantity": 2, "priceAtTime": 75.00}, {"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}]}
{"storeId": 1, "orderId": "order127", "items": [{"productId": "prod7", "quantity": 1, "priceAtTime": 40.00}]}

tb datasource truncate order_items --yes && tb datasource append order_items order-items.ndjson

# Running against Tinybird Local
# ** Data Source 'order_items' truncated
# Running against Tinybird Local
# Importing data to order_items...
# ✓ Done!

tb sql "select items__productId from order_items"

# Running against Tinybird Local
#  items__productId             
#  Array(String)                
#  ───────────────────────────────
#   ['prod1', 'prod2']           
#  ───────────────────────────────
#   ['prod3', 'prod2']           
#  ───────────────────────────────
#   ['prod2', 'prod6', 'prod1']  
#  ───────────────────────────────
#   ['prod1', 'prod2']           
#  ───────────────────────────────
#   ['prod3', 'prod2']           
#  ───────────────────────────────
#   ['prod2', 'prod6', 'prod1']  
#  ───────────────────────────────
#   ['prod7']

The data is ready, so let's proceed with the .pipe and SQL logic. Node by node:

orders_multiprod: first, let's filter out the orders that only have 1 product.
unrolled_prods: flatten the arrays to generate one row per array element.
prod_pairs: take the products of the same order in sets of 2
ranking: count coincidences

endpoints/products_bought_together.pipe

NODE orders_multiprod

SQL >
	SELECT orderId, items__productId, FROM order_items WHERE length(items__productId) > 1

NODE unrolled_prods

SQL >
	SELECT orderId, prods FROM orders_multiprod ARRAY JOIN items__productId as prods

NODE prod_pairs

SQL >
	SELECT distinct t1.orderId, t1.prods as product1, t2.prods as product2
    	FROM unrolled_prods t1
    	JOIN unrolled_prods t2 ON t1.orderId = t2.orderId
    	WHERE product1 != product2

NODE ranking

SQL >
	SELECT product1, product2, count() as pair_frequency
	FROM prod_pairs
	GROUP BY product1, product2
	ORDER BY product1, product2

TYPE ENDPOINT

Tip: as this pipe is more complex, checking the results of the nodes in the UI is super helpful, so I'd do tb dev --ui and edit nodes in the UI.

As always, verify the API endpoint works by making a request:

curl -X GET "http://localhost:7181/v0/pipes/products_bought_together.csv?token=$TB_LOCAL_TOKEN"

# "product1","product2","pair_frequency"
# "prod1","prod2",3
# "prod1","prod6",1
# "prod2","prod1",3
# "prod2","prod3",1
# "prod2","prod6",1
# "prod3","prod2",1
# "prod6","prod1",1
# "prod6","prod2",1

Deploy to Tinybird Cloud¶

You have the endpoints, so you're ready to deploy to production. Use the Tinybird CLI to push your changes.

Since your project is not in production yet, it is safe to YOLO and deploy directly with tb --cloud deploy. However, it is a good practice to run --check before.

tb --cloud deploy --check

# Running against Tinybird Cloud: Workspace ecommerce-analytics

# » Validating deployment...


# * Changes to be deployed:
# -------------------------------------------------------------------------------
# | status | name                 	| path                                	|
# -------------------------------------------------------------------------------
# | new	| orders               	| datasources/orders.datasource       	|
# | new	| order_items          	| datasources/order_items.datasource  	|
# | new	| products_bought_together | endpoints/products_bought_together.pipe |
# | new	| total_revenue        	| endpoints/total_revenue.pipe        	|
# -------------------------------------------------------------------------------
# * No changes in tokens to be deployed


# ✓ Deployment is valid

Conclusion¶

You started with some sample data, created a project, and now have a working project in production.

Next steps:

Send data to Tinybird Cloud.
Add query parameters to make the API dynamic. Filter on a store so you don't expose data from store 3 to store 2's owner.
Secure the endpoint with a token that is not the admin token.
Optimize the project. You want the API calls to be fast.
Get ready for production with testing and CI/CD. You want your existing APIs to keep working while you develop new features.

Get started

Ingest data

Work with data

Analytics agents

Test and deploy

Monitor your data

Tinybird Code

Administration

Pricing

Deployment options

Reference