Chapter 1: Idea to Production¶
What you'll build¶
In this chapter, you'll build the first version of your data project locally and deploy it to Tinybird Cloud.
The desired features are:
total-revenue
: for the merchant to see their total revenue in realtime.products-usually-bought-together
: to increase ticket size based on other buyers' behavior.
Data structure¶
You'll work with two event types:
orders
: Overall order informationorder_items
: Items within each order
Example data¶
Orders:
{"storeId": 1, "orderId": "order123", "userId": "user456", "timestamp": "2023-04-24T10:30:00Z", "totalAmount": 125.50} {"storeId": 1, "orderId": "order124", "userId": "user789", "timestamp": "2023-04-24T11:45:00Z", "totalAmount": 75.25}
Order Items:
{"storeId": 1, "orderId": "order123", "items": [{"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}, {"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}]} {"storeId": 1, "orderId": "order124", "items": [{"productId": "prod3", "quantity": 1, "priceAtTime": 50.25}, {"productId": "prod2", "quantity": 1, "priceAtTime": 25.00}]}
Set up the project¶
First, create a new directory for your Tinybird project. tb create
will create the scaffolding for you.
mkdir ecommerce-analytics && cd ecommerce-analytics
tb create » Creating new project structure... ✓ /datasources ✓ /endpoints ✓ /materializations ✓ /copies ✓ /pipes ✓ /fixtures ✓ /tests ✓ /connections ✓ Scaffolding completed! » Creating CI/CD files for GitHub and GitLab... ✓ /.gitignore ✓ .github/workflows/tinybird-ci.yml ✓ .github/workflows/tinybird-cd.yml ✓ ./.gitlab-ci.yml ✓ .gitlab/tinybird/tinybird-ci.yml ✓ .gitlab/tinybird/tinybird-cd.yml ✓ Done! » Creating .cursorrules... ✓ Done!
You just created the folder structure (/datasources, /endpoints, /fixtures…) where the code will be stored. Also, note the CI/CD files; You're prototyping now, so don't pay too much attention to them. You'll need them when you go to production.
Inserting and storing data¶
Next, define the tables and how the events are going to be stored in them.
Create a tmp-seed-data directory, and add samples for orders and order-items events.
mkdir tmp-seed-data
touch tmp-seed-data/orders.ndjson echo '{"storeId": 1, "orderId": "order123", "userId": "user456", "timestamp": "2023-04-24T10:30:00Z", "totalAmount": 125.50} {"storeId": 1, "orderId": "order124", "userId": "user789", "timestamp": "2023-04-24T11:45:00Z", "totalAmount": 75.25}' > tmp-seed-data/orders.ndjson touch tmp-seed-data/orders.ndjson echo '{"storeId": 1, "orderId": "order123", "items": [{"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}, {"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}]} {"storeId": 1, "orderId": "order124", "items": [{"productId": "prod3", "quantity": 1, "priceAtTime": 50.25}, {"productId": "prod2", "quantity": 1, "priceAtTime": 25.00}]}' > tmp-seed-data/order-items.ndjson
Create two .datasource files. Data sources are the definition of the database tables where you will store the order events. They specify how to access the data and how to store it.
You can proceed directly if you know the syntax, or use tb create with the --data
flag to pass the path to sample data.
tb create --data tmp-seed-data/orders.ndjson » Creating resources... ✓ /datasources/orders.datasource » Generating fixtures... ✓ /fixtures/orders tb create --data tmp-seed-data/order-items.ndjson » Creating resources... ✓ /datasources/order_items.datasource ✓ Done! » Generating fixtures... ✓ /fixtures/order_items
If you don't sample data, you can use --prompt
to pass an LLM prompt to generate a .datasource file and a fixture.
Now that the CLI created /fixtures
, you're good to delete tmp-seed-data
.
Examining the .datasource files, they should look like this:
datasources/orders.datasource
SCHEMA > orderId String `json:$.orderId`, storeId UInt32 `json:$.storeId`, timestamp String `json:$.timestamp`, totalAmount Float32 `json:$.totalAmount`, userId String `json:$.userId` ENGINE "MergeTree" ENGINE_SORTING_KEY "userId, orderId"
SCHEMA defines the column names, types, and jsonpath to extract the fields where column values are stored in the json.
ENGINE is set to MergeTree, the default table engine that is append only
And ENGINE_SORTING_KEY defines the order in which to physically store the data.
datasources/order_items.datasource
SCHEMA > orderId String `json:$.orderId`, storeId UInt32 `json:$.storeId`, items__productId Array(String) `json:$.items[:].productId`, items__priceAtTime Array(Float32) `json:$.items[:].priceAtTime`, items__quantity Array(Int16) `json:$.items[:].quantity` ENGINE "MergeTree" ENGINE_SORTING_KEY "orderId, storeId"
Similar to the previous one, but note the Array types and jsonpaths that will convert this JSON:
{"storeId": 1, "orderId": "order123", "items": [{"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}, {"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}]}
Into this row:
| storeId | orderId | items__productId | items__priceAtTime | items__quantity | |---------|----------|------------------|--------------------|-----------------| | 1 | order123 | [prod1, prod2] | [45.00, 35.00] | [2, 1] |
Validate the project locally¶
Run the docker image from CLI with tb local start
. Orbstack, Docker desktop, or the runtime of your preference is needed.
tb local start » Starting Tinybird Local... ✓ Tinybird Local is ready!
Build the project to validate your datafiles. If something is wrong (like the syntax), you'll receive an error and can correct it before deploying to the cloud.
tb build » Building project... ✓ datasources/orders.datasource created ✓ datasources/order_items.datasource created ✓ Build completed in 0.3s
The build step also ingests your fixtures so you can test locally with data. Verify it is OK, too:
tb sql "select * from orders" Running against Tinybird Local storeId orderId timestamp totalAmount userId UInt32 String String Float32 String ─────────────────────────────────────────────────────────────── 1 order123 2023-04-24T10:30:00Z 125.5 user456 ─────────────────────────────────────────────────────────────── 1 order124 2023-04-24T11:45:00Z 75.25 user789
Create the "total_revenue" endpoint¶
Now, create your first endpoint to calculate total revenue metrics. Endpoints are .pipe files, a convenient way to chain SQL queries together like a notebook. With the TYPE
command, you can state the desired behavior. You need to use endpoint
for exposing an API endpoint that you can call from other services.
endpoints/total_revenue.pipe
NODE total_revenue SQL > SELECT count() AS orders, sum(totalAmount) AS revenue, revenue / orders AS average_order_value FROM orders TYPE endpoint
Test the API endpoint¶
Verify the API endpoint works as expected. You need the url (by default, tb local runs on port 7181) and the admin user token. The path is v0/pipes/<endpoint name>.<format>?token=<token>
and you need a token to authenticate. The token can be passed as a query parameter or in the header.
tb token copy "admin local_testing@tinybird.co" && TB_LOCAL_TOKEN=$(pbpaste) curl -X GET "http://localhost:7181/v0/pipes/total_revenue.json?token=$TB_LOCAL_TOKEN" { "meta": [ { "name": "orders", "type": "UInt64" }, { "name": "revenue", "type": "Float64" }, { "name": "average_order_value", "type": "Float64" } ], "data": [ { "orders": 2, "revenue": 200.75, "average_order_value": 100.375 } ], "rows": 1, "statistics": { "elapsed": 0.005939524, "rows_read": 2, "bytes_read": 8 } }
Check meta and statistics, and more importantly, data. Data is in data array: 2 orders, numbers are OK, you're good to go.
Create the "products_usually_bought_together" endpoint¶
Next, let's create your product recommendation endpoint:
The idea is to end up with a table that contains the product you will be looking at, and a ranking of the products that are bought with it.
So, for these items:
["prod1", "prod2"] ["prod3", "prod2"] ["prod2", "prod6", "prod1"] ["prod2", "prod1"] ["prod7"]
We want a table like this:
productId, boughtWith, count ---------------------------- prod1, prod2, 3 prod1, prod6, 1 prod2, prod1, 3 prod2, prod3, 1 prod2, prod6, 1 prod3, prod2, 1 prod6, prod1, 1 prod6, prod2, 1
There isn't enough sample data in order-items.ndjson
, so let's remove the data present in 1 and add new rows:
sample-order-items.ndjson
{"storeId": 1, "orderId": "order123", "items": [{"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}, {"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}]} {"storeId": 1, "orderId": "order124", "items": [{"productId": "prod3", "quantity": 1, "priceAtTime": 50.25}, {"productId": "prod2", "quantity": 1, "priceAtTime": 25.00}]} {"storeId": 1, "orderId": "order126", "items": [{"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}, {"productId": "prod6", "quantity": 2, "priceAtTime": 75.00}, {"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}]} {"storeId": 1, "orderId": "order123", "items": [{"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}, {"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}]} {"storeId": 1, "orderId": "order124", "items": [{"productId": "prod3", "quantity": 1, "priceAtTime": 50.25}, {"productId": "prod2", "quantity": 1, "priceAtTime": 25.00}]} {"storeId": 1, "orderId": "order126", "items": [{"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}, {"productId": "prod6", "quantity": 2, "priceAtTime": 75.00}, {"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}]} {"storeId": 1, "orderId": "order127", "items": [{"productId": "prod7", "quantity": 1, "priceAtTime": 40.00}]}
tb datasource truncate order_items --yes && tb datasource append order_items order-items.ndjson Running against Tinybird Local ** Data Source 'order_items' truncated Running against Tinybird Local Importing data to order_items... ✓ Done! $ tb sql "select items__productId from order_items" Running against Tinybird Local items__productId Array(String) ─────────────────────────────── ['prod1', 'prod2'] ─────────────────────────────── ['prod3', 'prod2'] ─────────────────────────────── ['prod2', 'prod6', 'prod1'] ─────────────────────────────── ['prod1', 'prod2'] ─────────────────────────────── ['prod3', 'prod2'] ─────────────────────────────── ['prod2', 'prod6', 'prod1'] ─────────────────────────────── ['prod7']
The data is ready, so let's proceed with the .pipe and SQL logic. Node by node:
- orders_multiprod: first, let's filter out the orders that only have 1 product.
- unrolled_prods: flatten the arrays to generate one row per array element.
- prod_pairs: take the products of the same order in sets of 2
- ranking: count coincidences
endpoints/products_bought_together.pipe
NODE orders_multiprod SQL > SELECT orderId, items__productId, FROM order_items WHERE length(items__productId) > 1 NODE unrolled_prods SQL > SELECT orderId, prods FROM orders_multiprod ARRAY JOIN items__productId as prods NODE prod_pairs SQL > SELECT distinct t1.orderId, t1.prods as product1, t2.prods as product2 FROM unrolled_prods t1 JOIN unrolled_prods t2 ON t1.orderId = t2.orderId WHERE product1 != product2 NODE ranking SQL > SELECT product1, product2, count() as pair_frequency FROM prod_pairs GROUP BY product1, product2 ORDER BY product1, product2 TYPE ENDPOINT
Tip: as this pipe is more complex, checking the results of the nodes in the UI is super helpful, so I'd do tb dev --ui
and edit nodes in the UI.
As always, verify the API endpoint works by making a request:
curl -X GET "http://localhost:7181/v0/pipes/products_bought_together.csv?token=$TB_LOCAL_TOKEN" "product1","product2","pair_frequency" "prod1","prod2",3 "prod1","prod6",1 "prod2","prod1",3 "prod2","prod3",1 "prod2","prod6",1 "prod3","prod2",1 "prod6","prod1",1 "prod6","prod2",1
Deploy to Tinybird Cloud¶
You have the endpoints, so you're ready to deploy to production. Use the Tinybird CLI to push your changes.
Since your project is not in production yet, it is safe to YOLO and deploy directly with tb --cloud deploy
. However, it is a good practice to run --check
before.
tb --cloud deploy --check Running against Tinybird Cloud: Workspace ecom » Validating deployment... * Changes to be deployed: ------------------------------------------------------------------------------- | status | name | path | ------------------------------------------------------------------------------- | new | orders | datasources/orders.datasource | | new | order_items | datasources/order_items.datasource | | new | products_bought_together | endpoints/products_bought_together.pipe | | new | total_revenue | endpoints/total_revenue.pipe | ------------------------------------------------------------------------------- * No changes in tokens to be deployed ✓ Deployment is valid
Conclusion¶
You started with some sample data, created a project, and now have a working project in production.
Next steps:
- Send data to Tinybird Cloud.
- Add query parameters to make the API dynamic. Filter on a store so you don't expose data from store 3 to store 2's owner.
- Secure the endpoint with a token that is not the admin token.
- Optimize the project. You want the API calls to be fast.
- Get ready for production with testing and CI/CD. You want your existing APIs to keep working while you develop new features.