When you're analyzing web traffic or building APIs that process URLs, extracting clean query parameters without the leading ?
or trailing #
fragments becomes a frequent requirement. ClickHouse's built-in URL functions handle this parsing automatically, eliminating the need for complex regex patterns or manual string manipulation.
This guide covers the primary methods for extracting query strings using queryString()
and related functions, along with performance optimization strategies for large datasets and real-time ingestion patterns.
ClickHouse URL query functions explained
The queryString()
function extracts the query string portion from a URL while automatically removing the leading question mark and any fragment identifiers. A query string contains the parameters that follow the ?
in a URL, like param1=value1¶m2=value2
in https://example.com/path?param1=value1¶m2=value2#section
.
You might be wondering why you'd want to extract just the query string without the ?
symbol. When you're analyzing web traffic or building APIs that process URL parameters, you often want the clean parameter string for logging, forwarding to other services, or parsing individual values.
ClickHouse provides several built-in functions specifically designed for URL manipulation. The main ones you'll work with are queryString()
, cutFragment()
, and cutQueryString()
. Each handles different aspects of URL parsing without requiring manual string manipulation or regular expressions.
Syntax to extract the query string without ? or hash
The primary method uses ClickHouse's queryString()
function, which returns a clean query string without the leading question mark or trailing fragments. This function handles the most common use case in a single operation.
1. queryString
The queryString()
function takes a URL string as input and returns the query parameters as a clean string:
SELECT queryString('https://example.com/path?param1=value1¶m2=value2#fragment') AS query_string;
-- Returns: param1=value1¶m2=value2
This function automatically handles both the question mark removal and fragment stripping in a single operation. You can apply it directly to URL columns in your tables without additional preprocessing.
2. cutFragment
When working with URLs that contain fragments but you want to preserve other URL components, cutFragment()
removes everything after the #
symbol:
SELECT
cutFragment('https://example.com/path?param1=value1#section') AS clean_url,
queryString(cutFragment('https://example.com/path?param1=value1#section')) AS query_string;
-- Returns: https://example.com/path?param1=value1, param1=value1
Though queryString()
already handles fragments internally, cutFragment()
becomes useful when you need the clean URL for other operations. You'll find this pattern helpful when building URL normalization pipelines.
3. cutQueryString
The cutQueryString()
function removes the query string entirely, returning only the base URL:
SELECT cutQueryString('https://example.com/path?param1=value1¶m2=value2') AS base_url;
-- Returns: https://example.com/path
This function serves as the complement to queryString()
when you want URL normalization or need to separate the base path from parameters.
Handling missing or empty query strings
ClickHouse functions behave predictably when URLs lack query strings, though the specific return values vary between functions. Understanding this behavior helps you write more robust queries.
1. Returning NULL versus empty string
The queryString()
function returns an empty string when no query parameters exist:
SELECT queryString('https://example.com/path') AS result;
-- Returns: '' (empty string)
SELECT queryString('https://example.com/path?') AS result;
-- Returns: '' (empty string)
This consistent behavior simplifies downstream processing since you don't need to handle NULL values in most cases. However, you might want to distinguish between URLs with empty query strings versus URLs with no query strings at all.
2. Using ifNull for safe defaults
When you want explicit default values for missing query strings, combine queryString()
with conditional functions:
SELECT
url,
CASE
WHEN queryString(url) = '' THEN 'no-params'
ELSE queryString(url)
END AS query_or_default
FROM urls_table;
This pattern proves especially useful when building analytics queries where empty query strings might indicate specific user behaviors.
queryString versus extractURLParameter and regex
Different URL parsing approaches serve different use cases, from extracting complete query strings to isolating individual parameters. The choice between methods affects both performance and code complexity.
1. Function comparison
Here's how the main URL parsing functions compare:
- queryString(): Extracts complete query string, fast performance, returns string
- extractURLParameter(): Gets specific parameter value, fast performance, returns string
- extractURLParameterNames(): Lists all parameter names, medium performance, returns array
- Regex patterns: Handles complex URL parsing, slow performance, varies by pattern
Choose queryString()
when you want the entire parameter string for logging or forwarding. Use extractURLParameter()
when targeting specific parameters like tracking codes or user IDs.
2. When regex is still useful
Regex patterns become necessary for non-standard URL formats or when you want to extract multiple URL components simultaneously:
SELECT extractAllGroups(url, '([^?]+)\\?([^#]+)') AS url_parts
FROM urls_table;
-- Returns: [['https://example.com/path', 'param1=value1¶m2=value2']]
However, regex parsing typically performs 3-5x slower than dedicated URL functions on large datasets. You'll notice this performance difference becomes significant when processing millions of URLs.
Version compatibility and behavior changes
The queryString()
function has been available since ClickHouse 1.1.54388 with consistent behavior across versions. Recent versions include performance optimizations but maintain the same API contract.
ClickHouse 23.x introduced additional URL functions like extractURLParameterNames()
but the core queryString()
function remains unchanged. If you're running older versions, verify function availability with SELECT version()
before deployment.
Optimizing performance on billion-row tables
URL parsing performance becomes critical when processing large datasets, particularly for real-time analytics workloads. The difference between optimized and unoptimized approaches can mean the difference between sub-second and multi-minute query times.
1. Projections or materialized views
Pre-compute query string extraction using materialized views to avoid repeated parsing:
CREATE MATERIALIZED VIEW url_parsed_mv
ENGINE = MergeTree()
ORDER BY (timestamp, domain)
AS SELECT
timestamp,
domain,
queryString(full_url) AS query_string,
extractURLParameter(full_url, 'utm_source') AS utm_source
FROM raw_logs;
This approach reduces query latency from seconds to milliseconds on billion-row tables since the parsing happens once during ingestion. The trade-off is additional storage space for the parsed results.
2. Avoiding on-the-fly regex
Replace regex-based URL parsing with ClickHouse's native functions wherever possible:
-- Slow: regex parsing
SELECT extractAllGroups(url, '\\?([^#]+)') FROM logs;
-- Fast: native function
SELECT queryString(url) FROM logs;
Native functions leverage ClickHouse's optimized C++ implementations and often include SIMD optimizations for string processing.
Streaming ingestion and on-the-fly parsing
Processing query strings during data ingestion rather than at query time reduces storage requirements and improves query performance. This approach works particularly well for high-throughput scenarios.
1. Kafka connector pattern
Extract query strings while ingesting from Kafka streams using materialized views:
CREATE TABLE kafka_raw (
timestamp DateTime,
raw_url String
) ENGINE = Kafka()
SETTINGS kafka_broker_list = 'localhost:9092',
kafka_topic_list = 'web_logs';
CREATE MATERIALIZED VIEW parsed_urls_mv TO parsed_urls AS
SELECT
timestamp,
raw_url,
queryString(raw_url) AS query_params
FROM kafka_raw;
This pattern processes millions of URLs per second while maintaining low memory overhead. The parsing happens automatically as data flows through the system.
2. Materialized view pattern
Automatically parse URLs on insert using materialized views attached to your main table:
CREATE MATERIALIZED VIEW url_analytics_mv
ENGINE = AggregatingMergeTree()
ORDER BY (date, domain)
AS SELECT
toDate(timestamp) AS date,
domain(url) AS domain,
queryString(url) AS query_string,
count() AS page_views
FROM web_logs
GROUP BY date, domain, query_string;
This approach enables real-time URL analytics without impacting insert performance.
Parsing individual parameters into key-value maps
Beyond extracting the complete query string, you can parse individual parameters into ClickHouse's native Map data type for structured analysis. This becomes especially useful when you want to analyze specific parameters across millions of URLs.
1. mapFromString for key value pairs
Convert query strings into Map types for easier parameter access:
SELECT
url,
mapFromString(queryString(url), '&', '=') AS params_map
FROM urls_table;
-- Returns: {'utm_source': 'google', 'utm_medium': 'cpc'}
The mapFromString()
function splits the query string on &
for pairs and =
for key-value separation. This creates a structured format for parameter analysis.
2. Accessing parameters with mapGet
Extract specific parameter values from the parsed map:
SELECT
url,
mapGet(mapFromString(queryString(url), '&', '='), 'utm_source') AS traffic_source,
mapGet(mapFromString(queryString(url), '&', '='), 'campaign_id') AS campaign
FROM marketing_data;
This pattern works particularly well for marketing attribution analysis where you want to isolate specific tracking parameters.
Building a real-time, ClickHouse-based URL analytics API with Tinybird
Let's create a complete example that ingests web traffic data and exposes an API endpoint returning query parameter analytics using queryString
.
This walkthrough shows how URL query parsing integrates into real-world analytics workflows, from data ingestion through API deployment. You'll see exactly how the function works in practice.
Tinybird is a managed service that makes it easy to build real-time analytics APIs powered by ClickHouse. It handles the infrastructure, scaling, and API generation so you can focus on writing SQL and building analytics features.
In this example, we'll use Tinybird to:
- Create a data source for ingesting web traffic events
- Write a pipe that processes and aggregates query parameters using
queryString
- Deploy an API endpoint that serves query parameter analytics
The workflow demonstrates how to combine ClickHouse's powerful URL processing with Tinybird's API capabilities to build production-ready analytics services.
1. Create the data source
First, create a data source to store your web traffic events:
SCHEMA >
`timestamp` DateTime64(3, 'UTC') `json:$.timestamp`,
`url` String `json:$.url`,
`user_id` UInt32 `json:$.user_id`,
`session_id` String `json:$.session_id`,
`page_title` String `json:$.page_title`,
`referrer` String `json:$.referrer`
ENGINE MergeTree
ENGINE_SORTING_KEY timestamp
ENGINE_PARTITION_KEY toYYYYMM(timestamp)
Now build the data source locally:
tb dev
This builds the project on Tinybird Local (which you'll need to have running) and creates your data source. Then ingest some sample data using the Events API:
curl -X POST "http://localhost:7181/v0/events?name=web_traffic" \
-H "Content-Type: application/json" \
-d '{"timestamp": "2025-10-03 14:30:15.123", "url": "https://www.example.com/products?utm_source=google&utm_medium=cpc&campaign_id=123", "user_id": 1001, "session_id": "sess_abc123", "page_title": "Products Page", "referrer": "https://www.google.com/search"}
{"timestamp": "2025-10-03 14:30:16.456", "url": "https://blog.example.com/article?id=456&source=newsletter", "user_id": 1002, "session_id": "sess_def456", "page_title": "Blog Article", "referrer": "https://www.facebook.com"}
{"timestamp": "2025-10-03 14:30:17.789", "url": "https://www.github.com/tinybird?tab=repositories&sort=stars", "user_id": 1003, "session_id": "sess_ghi789", "page_title": "GitHub Profile", "referrer": "https://www.twitter.com"}'
2. Write the pipe query
Create a pipe that analyzes query parameters:
NODE traffic_events
SQL >
SELECT
timestamp,
url,
queryString(url) AS query_string,
domain(url) AS domain,
extractURLParameter(url, 'utm_source') AS utm_source,
extractURLParameter(url, 'utm_medium') AS utm_medium,
extractURLParameter(url, 'campaign_id') AS campaign_id,
user_id,
session_id,
page_title
FROM web_traffic
WHERE timestamp >= now() - INTERVAL 7 DAY
AND queryString(url) != ''
NODE query_analytics
SQL >
%
SELECT
domain,
utm_source,
utm_medium,
campaign_id,
COUNT(*) AS total_events,
COUNT(DISTINCT user_id) AS unique_users,
COUNT(DISTINCT session_id) AS unique_sessions,
COUNT(DISTINCT query_string) AS unique_query_strings,
MIN(timestamp) AS first_seen,
MAX(timestamp) AS last_seen
FROM traffic_events
WHERE
{\% if defined(domain) %}
domain = {{ String(domain) }}
{\% end %}
{\% if defined(utm_source) %}
AND utm_source = {{ String(utm_source) }}
{\% end %}
{\% if defined(date_from) %}
AND timestamp >= {{ DateTime(date_from) }}
{\% end %}
{\% if defined(date_to) %}
AND timestamp <= {{ DateTime(date_to) }}
{\% end %}
GROUP BY domain, utm_source, utm_medium, campaign_id
ORDER BY total_events DESC
LIMIT {{ Int32(limit, 100) }}
TYPE endpoint
The queryString
function ensures consistent query parameter extraction regardless of URL complexity.
3. Deploy and call the endpoint
Deploy your pipe to create the API endpoint:
tb --cloud deploy
Then call your new endpoint:
curl -H "Authorization: Bearer <your_token>" \
"https://api.tinybird.co/v0/pipes/query_analytics.json?domain=example.com&limit=5"
The response returns neatly aggregated data with query parameter insights:
{
"meta": [
{
"name": "domain",
"type": "String"
},
{
"name": "utm_source",
"type": "Nullable(String)"
},
{
"name": "utm_medium",
"type": "Nullable(String)"
},
{
"name": "campaign_id",
"type": "Nullable(String)"
},
{
"name": "total_events",
"type": "UInt64"
},
{
"name": "unique_users",
"type": "UInt64"
},
{
"name": "unique_sessions",
"type": "UInt64"
},
{
"name": "unique_query_strings",
"type": "UInt64"
},
{
"name": "first_seen",
"type": "DateTime"
},
{
"name": "last_seen",
"type": "DateTime"
}
],
"data": [
{
"domain": "example.com",
"utm_source": "google",
"utm_medium": "cpc",
"campaign_id": "123",
"total_events": 1247,
"unique_users": 892,
"unique_sessions": 1156,
"unique_query_strings": 23,
"first_seen": "2025-10-03 14:30:15",
"last_seen": "2025-10-03 15:45:22"
},
{
"domain": "example.com",
"utm_source": null,
"utm_medium": null,
"campaign_id": null,
"total_events": 456,
"unique_users": 234,
"unique_sessions": 345,
"unique_query_strings": 12,
"first_seen": "2025-10-03 14:30:16",
"last_seen": "2025-10-03 15:30:18"
}
],
"rows": 2,
"statistics": {
"elapsed": 0.045,
"rows_read": 2000,
"bytes_read": 89000
}
}
Next steps for fast URL analytics with ClickHouse
While ClickHouse's URL functions provide excellent performance for URL parsing, managing ClickHouse clusters at scale introduces operational complexity. Tinybird's managed ClickHouse platform significantly reduces infrastructure overhead and delivers ClickHouse-level performance in a managed environment.
The platform includes built-in API generation, real-time ingestion, and automatic scaling. You can deploy URL parsing pipelines in minutes rather than weeks.
Sign up for a free Tinybird account to build and test your first ClickHouse-based API in just a few minutes.
Additional resources
- queryString function
- extractURLParameter function
- cutFragment function
- URL functions in ClickHouse
- String functions in ClickHouse
- How to extract URL fragments without hash symbols (#) in ClickHouse
- How to extract the protocol of a URL in ClickHouse
- How to extract domains without www from URLs in ClickHouse
Frequently asked questions about extracting ClickHouse query strings
Did queryString always strip the leading question mark?
Yes, the queryString()
function has consistently returned clean query strings without the ?
prefix across all ClickHouse versions since its introduction. This behavior remains unchanged in current versions, ensuring backward compatibility for existing queries.
What happens if the URL contains encoded "?" or "#" characters?
ClickHouse handles URL-encoded characters like %3F
(encoded ?
) and %23
(encoded #
) as literal characters within the query string. The function only recognizes actual ?
and #
characters as delimiters, not their encoded equivalents.
Can I extract multiple query strings from a single text blob?
Yes, combine extractAllGroups()
with regex patterns to find multiple URLs in text and extract their query strings:
SELECT arrayMap(x -> queryString(x), extractAll(text, 'https?://[^\\s]+')) AS all_query_strings
FROM text_data;
This approach works well for parsing log files or documents containing multiple URLs.
How do I remove the query string entirely from a URL column?
Use the cutQueryString()
function to get clean URLs without parameters:
SELECT cutQueryString(url) AS clean_url FROM urls_table;
-- Converts: https://example.com/path?param=value
-- To: https://example.com/path
This function proves useful for URL normalization in analytics where you want to group pages regardless of their parameters.