URL query parameters often hold information about user behavior, campaign performance, and application state, but extracting specific parameter values from URL strings in ClickHouse requires knowing which functions to use and how they handle edge cases.
This guide covers ClickHouse's three main parameter extraction functions, performance optimization techniques for billion-row datasets, and practical examples for building production analytics pipelines that reliably parse URL parameters at scale.
Get a single query parameter in ClickHouse
To extract the value of a specific query parameter from a URL in ClickHouse, you use the extractURLParameter
function. The syntax is extractURLParameter(url_string, parameter_name)
where both arguments are strings, and the function returns the parameter value or an empty string if not found.
ClickHouse gives you three main functions for pulling query parameters out of URL strings. Each one works a bit differently depending on what you're trying to accomplish.
1. extractURLParameter(url, 'param')
The extractURLParameter
function is your go-to choice when you know exactly which parameter you want. Here's how it works with a typical web analytics URL:
SELECT extractURLParameter('https://example.com/page?utm_source=google&utm_medium=cpc', 'utm_source') AS source;
-- Returns: google
This function is straightforward and readable, which makes it perfect for most use cases where you're after a single parameter value.
2. url_extract_parameter(url, 'param')
The url_extract_parameter
function does exactly the same thing as extractURLParameter
but uses a more modern naming convention. Both functions take identical arguments and return the same results:
SELECT url_extract_parameter('https://shop.example.com/product?id=123&color=red', 'color') AS product_color;
-- Returns: red
You'll see both function names in existing codebases, though extractURLParameter
appears more frequently in documentation and examples.
3. extractURLParametersurl
When you need multiple parameters from the same URL, extractURLParameters
extracts all parameters as a map that you can access by key:
SELECT
extractURLParameters('https://api.example.com/search?q=analytics&page=2&limit=50')['q'] AS query,
extractURLParameters('https://api.example.com/search?q=analytics&page=2&limit=50')['page'] AS page_num;
-- Returns: query='analytics', page_num='2'
This approach saves processing time compared to calling extractURLParameter
multiple times on the same URL.
Function signatures and return types
The technical details of these functions matter when you're building robust queries that handle edge cases properly.
Argument types
Both URL and parameter name arguments require String type. The functions won't automatically convert other data types, so you'll need explicit casting if your URLs are stored differently:
-- This works
SELECT extractURLParameter(url_column, 'campaign') FROM web_logs;
-- This requires casting
SELECT extractURLParameter(CAST(url_bytes AS String), 'campaign') FROM raw_logs;
Parameter names are case-sensitive and match exactly as they appear in the URL query string.
Nullable behaviour
The extraction functions return an empty string (''
) when a parameter doesn't exist, not NULL
. This behavior affects how you filter and process results:
-- Empty string for missing parameters
SELECT extractURLParameter('https://example.com/page', 'missing_param');
-- Returns: '' (empty string, not NULL)
-- Check for missing parameters
SELECT * FROM web_logs
WHERE extractURLParameter(url, 'utm_source') != '';
However, if the URL itself is malformed or NULL, the function returns NULL instead of an empty string.
Return data types
All URL parameter extraction functions return String
or Nullable(String)
depending on whether the input URL can be NULL. The extracted values are always strings, even when they represent numbers:
-- Numeric parameters are returned as strings
SELECT extractURLParameter('https://api.com/items?page=5&limit=100', 'page') AS page_str;
-- Returns: '5' (string, not integer)
-- Convert to numbers when needed
SELECT toUInt32(extractURLParameter(url, 'page')) AS page_num FROM api_logs;
Handling missing or duplicate parameters
Real-world URLs often contain missing parameters or duplicate parameter names, so you'll want defensive coding patterns to handle these cases gracefully.
Graceful null handling
Use coalesce()
or ifNull()
to provide default values when parameters are missing or URLs are malformed:
-- Provide default values for missing parameters
SELECT
coalesce(nullIf(extractURLParameter(url, 'utm_source'), ''), 'direct') AS traffic_source,
coalesce(nullIf(extractURLParameter(url, 'utm_campaign'), ''), 'unknown') AS campaign
FROM web_logs;
The nullIf()
function converts empty strings to NULL, which allows coalesce()
to apply the default value.
First-value vs array strategies
When URLs contain duplicate parameter names like ?tag=red&tag=blue
, ClickHouse's extraction functions return only the first occurrence. For complete extraction, you'll need regex-based approaches:
-- Standard extraction returns first value only
SELECT extractURLParameter('https://shop.com/items?tag=red&tag=blue', 'tag');
-- Returns: 'red'
-- Extract all values using regex
SELECT extractAll('https://shop.com/items?tag=red&tag=blue', 'tag=([^&]+)') AS all_tags;
-- Returns: ['red', 'blue']
Most analytics use cases care about the first value, while e-commerce applications might need all tag values.
Decode url-encoded values safely
URL parameters often contain encoded characters that need decoding before you can analyze or display them properly.
decodeURLComponent usage
The decodeURLComponent
function handles percent-encoded characters, spaces, and special symbols:
-- Raw extraction returns encoded values
SELECT extractURLParameter('https://search.com/q?term=data%20analytics%20%26%20visualization', 'term');
-- Returns: 'data%20analytics%20%26%20visualization'
-- Decode for readable results
SELECT decodeURLComponent(extractURLParameter('https://search.com/q?term=data%20analytics%20%26%20visualization', 'term'));
-- Returns: 'data analytics & visualization'
Always decode parameters that might contain user input or natural language text to get meaningful results.
Combine extract and decode
For production queries, combine extraction and decoding in a single operation to avoid processing encoded strings downstream:
SELECT
decodeURLComponent(extractURLParameter(request_url, 'search_query')) AS decoded_query,
decodeURLComponent(extractURLParameter(request_url, 'user_agent')) AS decoded_agent
FROM access_logs
WHERE extractURLParameter(request_url, 'search_query') != '';
This pattern ensures consistent handling of special characters across your entire analytics pipeline.
Performance tips for billion-row log tables
When you're processing massive web log datasets, extraction performance becomes critical for maintaining fast query response times.
Pre-extract during ingestion
Rather than extracting parameters at query time, create materialized views that extract common parameters during data ingestion:
CREATE MATERIALIZED VIEW web_logs_enriched
ENGINE = MergeTree()
ORDER BY (timestamp, utm_source)
AS SELECT
timestamp,
url,
extractURLParameter(url, 'utm_source') AS utm_source,
extractURLParameter(url, 'utm_medium') AS utm_medium,
extractURLParameter(url, 'utm_campaign') AS utm_campaign
FROM web_logs_raw;
This approach trades storage space for query performance, often reducing query times dramatically.
Use projections for flexible extraction
ClickHouse projections allow you to pre-compute extractions while maintaining query flexibility:
ALTER TABLE web_logs
ADD PROJECTION utm_params
(
SELECT
timestamp,
extractURLParameter(url, 'utm_source') AS utm_source,
extractURLParameter(url, 'utm_campaign') AS utm_campaign,
count()
GROUP BY timestamp, utm_source, utm_campaign
);
Projections automatically optimize queries that filter or group by extracted parameters.
Alternatives and when to use them
While URL parameter extraction functions handle most use cases, certain scenarios benefit from different approaches.
Regex functions
Use extractAll()
with regex patterns when you need custom parsing logic or want to extract multiple occurrences of the same parameter:
-- Extract all numeric IDs from complex URLs
SELECT extractAll(url, 'id[=:](\d+)') AS all_ids
FROM api_logs
WHERE url LIKE '%id%';
-- Extract custom parameter patterns
SELECT extractAll(url, '([a-z_]+)=([^&]+)') AS all_params
FROM web_logs;
Regex extraction offers more flexibility but typically runs slower than dedicated URL functions.
URLHierarchy table function
When analyzing URL paths rather than parameters, the URLHierarchy
function breaks down the entire URL structure:
SELECT * FROM URLHierarchy('https://shop.example.com/category/electronics/phones?brand=apple');
-- Returns hierarchical breakdown of URL components
This approach works well for path-based analytics and URL structure analysis.
JSONExtract for structured logs
When URLs are embedded within JSON log entries, combine JSON extraction with URL parameter extraction:
SELECT
extractURLParameter(JSONExtractString(log_data, 'request_url'), 'session_id') AS session,
JSONExtractString(log_data, 'user_id') AS user
FROM structured_logs
WHERE JSONHas(log_data, 'request_url');
This pattern appears frequently in modern application logging where URLs are part of larger structured payloads.
Building ClickHouse-based web analytics APIs with URL parameter extraction using Tinybird
A common web analytics pattern involves extracting UTM parameters and other tracking data from URL strings to analyze marketing campaign performance, traffic sources, and user behavior. Using ClickHouse's URL parameter extraction functions, you can build APIs that process millions of web requests and provide real-time insights into campaign effectiveness. For more advanced web analytics scenarios including multitenancy and AI-powered insights, these parameter extraction techniques form the foundation of comprehensive user behavior tracking.
Here's how to build a complete web analytics system using Tinybird's managed ClickHouse platform.
Step 1: Create a data source for web analytics events
First, create a data source to store web analytics events with URL parameter extraction. Create a web_analytics.datasource
file:
SCHEMA >
`timestamp` DateTime64(3, 'UTC') `json:$.timestamp`,
`session_id` String `json:$.session_id`,
`user_id` String `json:$.user_id`,
`page_url` String `json:$.page_url`,
`referrer_url` String `json:$.referrer_url`,
`user_agent` String `json:$.user_agent`,
`ip_address` String `json:$.ip_address`,
`device_type` LowCardinality(String) `json:$.device_type`,
`country` LowCardinality(String) `json:$.country`,
`utm_source` String MATERIALIZED decodeURLComponent(extractURLParameter(page_url, 'utm_source')),
`utm_medium` String MATERIALIZED decodeURLComponent(extractURLParameter(page_url, 'utm_medium')),
`utm_campaign` String MATERIALIZED decodeURLComponent(extractURLParameter(page_url, 'utm_campaign')),
`utm_term` String MATERIALIZED decodeURLComponent(extractURLParameter(page_url, 'utm_term')),
`utm_content` String MATERIALIZED decodeURLComponent(extractURLParameter(page_url, 'utm_content'))
ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp, session_id"
Step 2: Test locally with sample data
Start the local Tinybird development server:
tb dev
This starts a local Tinybird instance at http://localhost:7181
. Your data sources are automatically deployed to the local environment.
Ingest sample web analytics data using the local Events API:
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TINYBIRD_TOKEN" \
"http://localhost:7181/v0/events?name=web_analytics" \
-d '[
{"timestamp": "2024-12-01 10:30:00", "session_id": "sess_001", "user_id": "user_123", "page_url": "https://example.com/landing?utm_source=google&utm_medium=cpc&utm_campaign=winter_sale&utm_term=analytics%20tools", "referrer_url": "https://www.google.com/search", "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)", "ip_address": "192.168.1.100", "device_type": "Desktop", "country": "US"},
{"timestamp": "2024-12-01 11:15:00", "session_id": "sess_002", "user_id": "user_456", "page_url": "https://example.com/products?utm_source=facebook&utm_medium=social&utm_campaign=winter_sale&utm_content=carousel_ad", "referrer_url": "https://www.facebook.com", "user_agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 15_0)", "ip_address": "192.168.1.101", "device_type": "Mobile", "country": "CA"},
{"timestamp": "2024-12-01 12:00:00", "session_id": "sess_003", "user_id": "user_789", "page_url": "https://example.com/blog/analytics-guide?utm_source=newsletter&utm_medium=email&utm_campaign=content_marketing", "referrer_url": "https://mail.google.com", "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)", "ip_address": "192.168.1.102", "device_type": "Desktop", "country": "GB"},
{"timestamp": "2024-12-01 14:30:00", "session_id": "sess_004", "user_id": "user_012", "page_url": "https://example.com/pricing", "referrer_url": "https://example.com/landing", "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)", "ip_address": "192.168.1.103", "device_type": "Desktop", "country": "US"}
]'
Step 3: Create marketing analytics APIs
Create a pipe for UTM campaign performance analysis. Create campaign_analytics.pipe
:
NODE campaign_performance
SQL >
%
SELECT
utm_campaign,
utm_source,
utm_medium,
utm_term,
utm_content,
count() AS total_visits,
uniq(session_id) AS unique_sessions,
uniq(user_id) AS unique_users,
uniq(ip_address) AS unique_ips,
-- Traffic source categorization
CASE
WHEN utm_source = '' AND utm_medium = '' THEN 'direct'
WHEN utm_medium = 'organic' THEN 'organic_search'
WHEN utm_medium IN ('cpc', 'ppc') THEN 'paid_search'
WHEN utm_medium = 'social' THEN 'social_media'
WHEN utm_medium = 'email' THEN 'email_marketing'
WHEN utm_medium = 'referral' THEN 'referral'
ELSE 'other'
END AS traffic_category,
-- Device and location insights
countIf(device_type = 'Mobile') AS mobile_visits,
countIf(device_type = 'Desktop') AS desktop_visits,
topK(3)(country) AS top_countries
FROM web_analytics
WHERE timestamp >= {{DateTime(start_date, '2024-12-01 00:00:00')}}
AND timestamp <= {{DateTime(end_date, '2024-12-31 23:59:59')}}
{\% if defined(campaign_filter) %}
AND utm_campaign = {{String(campaign_filter)}}
{\% end %}
{\% if defined(source_filter) %}
AND utm_source = {{String(source_filter)}}
{\% end %}
{\% if defined(medium_filter) %}
AND utm_medium = {{String(medium_filter)}}
{\% end %}
{\% if defined(country_filter) %}
AND country = {{String(country_filter)}}
{\% end %}
GROUP BY
utm_campaign,
utm_source,
utm_medium,
utm_term,
utm_content,
traffic_category
ORDER BY total_visits DESC
{\% if defined(limit) %}
LIMIT {{Int32(limit, 100)}}
{\% end %}
TYPE ENDPOINT
Create a daily traffic trends API. Create daily_traffic_trends.pipe
:
NODE daily_trends
SQL >
%
SELECT
toDate(timestamp) AS date,
toHour(timestamp) AS hour,
utm_source,
utm_medium,
count() AS hourly_visits,
uniq(session_id) AS hourly_sessions,
uniq(user_id) AS hourly_users,
-- Running totals for trend analysis
sum(count()) OVER (
PARTITION BY toDate(timestamp), utm_source, utm_medium
ORDER BY toHour(timestamp)
ROWS UNBOUNDED PRECEDING
) AS cumulative_daily_visits,
-- Parameter extraction for custom analysis
groupArray(DISTINCT extractURLParameter(page_url, {{String(custom_param, 'utm_term')}})) AS custom_param_values
FROM web_analytics
WHERE timestamp >= {{DateTime(start_date, '2024-12-01 00:00:00')}}
AND timestamp <= {{DateTime(end_date, '2024-12-07 23:59:59')}}
AND (utm_source != '' OR utm_medium != '')
{\% if defined(source_filter) %}
AND utm_source = {{String(source_filter)}}
{\% end %}
{\% if defined(custom_param) %}
AND extractURLParameter(page_url, {{String(custom_param)}}) != ''
{\% end %}
GROUP BY date, hour, utm_source, utm_medium
ORDER BY date DESC, hour DESC, hourly_visits DESC
{\% if defined(limit) %}
LIMIT {{Int32(limit, 200)}}
{\% end %}
TYPE ENDPOINT
Step 4: Test the APIs locally
Test your marketing analytics APIs locally:
# Test campaign performance analysis
curl "http://localhost:7181/v0/pipes/campaign_analytics.json?start_date=2024-12-01%2000:00:00&end_date=2024-12-31%2023:59:59"
# Filter by specific campaign
curl "http://localhost:7181/v0/pipes/campaign_analytics.json?campaign_filter=winter_sale&source_filter=google"
# Test daily traffic trends
curl "http://localhost:7181/v0/pipes/daily_traffic_trends.json?start_date=2024-12-01%2000:00:00&end_date=2024-12-07%2023:59:59"
# Extract custom parameters dynamically
curl "http://localhost:7181/v0/pipes/daily_traffic_trends.json?custom_param=utm_content&source_filter=facebook"
You should see the URL parameter extraction in action - UTM parameters are automatically decoded and materialized, while custom parameters can be extracted dynamically.
Step 5: Deploy to production
Once you've tested your web analytics locally and verified the parameter extraction works correctly, deploy to your Tinybird cloud workspace:
tb --cloud deploy
Your APIs are now available in production:
# Production API calls
curl "$TINYBIRD_HOST/v0/pipes/campaign_analytics.json?start_date=2024-12-01%2000:00:00&end_date=2024-12-31%2023:59:59" \
-H "Authorization: Bearer $TINYBIRD_TOKEN"
curl "$TINYBIRD_HOST/v0/pipes/daily_traffic_trends.json?custom_param=utm_term&source_filter=google" \
-H "Authorization: Bearer $TINYBIRD_TOKEN"
The APIs return comprehensive marketing analytics with proper URL parameter handling:
{
"meta": [
{"name": "utm_campaign", "type": "String"},
{"name": "utm_source", "type": "String"},
{"name": "utm_medium", "type": "String"},
{"name": "utm_term", "type": "String"},
{"name": "utm_content", "type": "String"},
{"name": "total_visits", "type": "UInt64"},
{"name": "unique_sessions", "type": "UInt64"},
{"name": "unique_users", "type": "UInt64"},
{"name": "traffic_category", "type": "String"},
{"name": "mobile_visits", "type": "UInt64"},
{"name": "desktop_visits", "type": "UInt64"},
{"name": "top_countries", "type": "Array(String)"}
],
"data": [
{
"utm_campaign": "winter_sale",
"utm_source": "google",
"utm_medium": "cpc",
"utm_term": "analytics tools",
"utm_content": "",
"total_visits": 1,
"unique_sessions": 1,
"unique_users": 1,
"traffic_category": "paid_search",
"mobile_visits": 0,
"desktop_visits": 1,
"top_countries": ["US"]
},
{
"utm_campaign": "winter_sale",
"utm_source": "facebook",
"utm_medium": "social",
"utm_term": "",
"utm_content": "carousel_ad",
"total_visits": 1,
"unique_sessions": 1,
"unique_users": 1,
"traffic_category": "social_media",
"mobile_visits": 1,
"desktop_visits": 0,
"top_countries": ["CA"]
}
],
"rows": 2,
"statistics": {
"elapsed": 0.003,
"rows_read": 4,
"bytes_read": 640
}
}
This approach provides:
- Materialized URL parameter extraction for optimal query performance
- Automatic URL decoding to handle encoded characters in campaign names and terms
- Traffic source categorization based on UTM parameter combinations
- Dynamic parameter extraction allowing APIs to extract any URL parameter by name
- Multi-dimensional analytics including device type, geographic, and temporal analysis
- Real-time insights with sub-millisecond API response times
Your web analytics data becomes available as production-ready API endpoints that automatically extract and decode URL parameters while providing comprehensive marketing campaign insights. Start building with Tinybird's free plan to create your own web analytics APIs.
What to do next with ClickHouse and Tinybird
ClickHouse provides three main functions for URL parameter extraction. The extractURLParameter
function works for single values, extractURLParameters
handles multiple parameters, and url_extract_parameter
offers a modern alternative. Always decode extracted parameters using decodeURLComponent
when they might contain special characters, and handle missing parameters gracefully with coalesce
or nullIf
.
For production workloads processing billions of rows, pre-extract common parameters during ingestion using materialized views or projections rather than extracting at query time. This approach typically improves query performance significantly while slightly increasing storage requirements.
The key performance optimizations include:
- Materialized views: Pre-extract during ingestion for faster queries
- Projections: Automatically optimize filtered queries on extracted parameters
- Defensive coding: Use
coalesce()
andnullIf()
for robust parameter handling - Decoding: Always decode user-facing parameters with
decodeURLComponent
While managing ClickHouse infrastructure for parameter extraction at scale involves significant complexity around replication, backups, and performance tuning, Tinybird's managed ClickHouse platform abstracts most of these operational concerns. Users can focus on building analytics features rather than managing database infrastructure, although some operational settings or monitoring may still require user attention depending on the plan and use case.
Get started with Tinybird for free to build real-time analytics APIs without managing ClickHouse infrastructure.
Additional resources
- Functions for Working with URLs
- String Functions
- Materialized Views
- Real-time analytics with billion rows at scale
- Tinybird vs. ClickHouse: What's the difference?
- Web Analytics Starter Kit
FAQs about query parameter extraction in ClickHouse
Can I extract query parameters from https URLs?
Yes, the extraction functions work identically on HTTP and HTTPS URLs since they only parse the string content, not the protocol. The functions analyze the query string portion after the ?
character regardless of the URL scheme.
How do I handle urls with fragments or anchors?
ClickHouse URL functions ignore fragments (the part after #) and only process the query string portion for parameter extraction. If your URLs contain fragments, they won't interfere with parameter extraction from the query string.
What happens when extracting parameters from malformed URLs?
The functions return NULL for malformed URLs rather than throwing errors, making them safe for processing messy log data. This graceful failure handling prevents queries from crashing when encountering invalid URL formats in real-world datasets.
Can I index extracted query parameters for faster filtering?
You can create materialized views or projections that pre-extract parameters during ingestion, which enables efficient filtering on parameter values. This approach is much faster than extracting parameters at query time for large datasets.
How do I extract parameters from urls stored in arrays?
Use arrayMap()
with the extraction function to process each URL in an array, or arrayJoin()
to flatten the array first. For example: arrayMap(url -> extractURLParameter(url, 'campaign'), url_array)
processes all URLs in the array.