Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

When you're analyzing web traffic or building APIs that process URLs, extracting clean query parameters without the leading ? or trailing # fragments becomes a frequent requirement. ClickHouse's built-in URL functions handle this parsing automatically, eliminating the need for complex regex patterns or manual string manipulation.

This guide covers the primary methods for extracting query strings using queryString() and related functions, along with performance optimization strategies for large datasets and real-time ingestion patterns.

ClickHouse URL query functions explained

The queryString() function extracts the query string portion from a URL while automatically removing the leading question mark and any fragment identifiers. A query string contains the parameters that follow the ? in a URL, like param1=value1&param2=value2 in https://example.com/path?param1=value1&param2=value2#section.

You might be wondering why you'd want to extract just the query string without the ? symbol. When you're analyzing web traffic or building APIs that process URL parameters, you often want the clean parameter string for logging, forwarding to other services, or parsing individual values.

ClickHouse provides several built-in functions specifically designed for URL manipulation. The main ones you'll work with are queryString(), cutFragment(), and cutQueryString(). Each handles different aspects of URL parsing without requiring manual string manipulation or regular expressions.

Start building with Tinybird!

If you've read this far, you might want to use Tinybird as your analytics backend. You can just get started, on the free plan.

Syntax to extract the query string without ? or hash

The primary method uses ClickHouse's queryString() function, which returns a clean query string without the leading question mark or trailing fragments. This function handles the most common use case in a single operation.

1. queryString

The queryString() function takes a URL string as input and returns the query parameters as a clean string:

SELECT queryString('https://example.com/path?param1=value1&param2=value2#fragment') AS query_string;
-- Returns: param1=value1&param2=value2

This function automatically handles both the question mark removal and fragment stripping in a single operation. You can apply it directly to URL columns in your tables without additional preprocessing.

2. cutFragment

When working with URLs that contain fragments but you want to preserve other URL components, cutFragment() removes everything after the # symbol:

SELECT 
    cutFragment('https://example.com/path?param1=value1#section') AS clean_url,
    queryString(cutFragment('https://example.com/path?param1=value1#section')) AS query_string;
-- Returns: https://example.com/path?param1=value1, param1=value1

Though queryString() already handles fragments internally, cutFragment() becomes useful when you need the clean URL for other operations. You'll find this pattern helpful when building URL normalization pipelines.

3. cutQueryString

The cutQueryString() function removes the query string entirely, returning only the base URL:

SELECT cutQueryString('https://example.com/path?param1=value1&param2=value2') AS base_url;
-- Returns: https://example.com/path

This function serves as the complement to queryString() when you want URL normalization or need to separate the base path from parameters.

Handling missing or empty query strings

ClickHouse functions behave predictably when URLs lack query strings, though the specific return values vary between functions. Understanding this behavior helps you write more robust queries.

1. Returning NULL versus empty string

The queryString() function returns an empty string when no query parameters exist:

SELECT queryString('https://example.com/path') AS result;
-- Returns: '' (empty string)

SELECT queryString('https://example.com/path?') AS result;  
-- Returns: '' (empty string)

This consistent behavior simplifies downstream processing since you don't need to handle NULL values in most cases. However, you might want to distinguish between URLs with empty query strings versus URLs with no query strings at all.

2. Using ifNull for safe defaults

When you want explicit default values for missing query strings, combine queryString() with conditional functions:

SELECT 
    url,
    CASE 
        WHEN queryString(url) = '' THEN 'no-params'
        ELSE queryString(url)
    END AS query_or_default
FROM urls_table;

This pattern proves especially useful when building analytics queries where empty query strings might indicate specific user behaviors.

queryString versus extractURLParameter and regex

Different URL parsing approaches serve different use cases, from extracting complete query strings to isolating individual parameters. The choice between methods affects both performance and code complexity.

1. Function comparison

Here's how the main URL parsing functions compare:

queryString(): Extracts complete query string, fast performance, returns string
extractURLParameter(): Gets specific parameter value, fast performance, returns string
extractURLParameterNames(): Lists all parameter names, medium performance, returns array
Regex patterns: Handles complex URL parsing, slow performance, varies by pattern

Choose queryString() when you want the entire parameter string for logging or forwarding. Use extractURLParameter() when targeting specific parameters like tracking codes or user IDs.

2. When regex is still useful

Regex patterns become necessary for non-standard URL formats or when you want to extract multiple URL components simultaneously:

SELECT extractAllGroups(url, '([^?]+)\\?([^#]+)') AS url_parts
FROM urls_table;
-- Returns: [['https://example.com/path', 'param1=value1&param2=value2']]

However, regex parsing typically performs 3-5x slower than dedicated URL functions on large datasets. You'll notice this performance difference becomes significant when processing millions of URLs.

Version compatibility and behavior changes

The queryString() function has been available since ClickHouse 1.1.54388 with consistent behavior across versions. Recent versions include performance optimizations but maintain the same API contract.

ClickHouse 23.x introduced additional URL functions like extractURLParameterNames() but the core queryString() function remains unchanged. If you're running older versions, verify function availability with SELECT version() before deployment.

Optimizing performance on billion-row tables

URL parsing performance becomes critical when processing large datasets, particularly for real-time analytics workloads. The difference between optimized and unoptimized approaches can mean the difference between sub-second and multi-minute query times.

1. Projections or materialized views

Pre-compute query string extraction using materialized views to avoid repeated parsing:

CREATE MATERIALIZED VIEW url_parsed_mv
ENGINE = MergeTree()
ORDER BY (timestamp, domain)
AS SELECT 
    timestamp,
    domain,
    queryString(full_url) AS query_string,
    extractURLParameter(full_url, 'utm_source') AS utm_source
FROM raw_logs;

This approach reduces query latency from seconds to milliseconds on billion-row tables since the parsing happens once during ingestion. The trade-off is additional storage space for the parsed results.

2. Avoiding on-the-fly regex

Replace regex-based URL parsing with ClickHouse's native functions wherever possible:

-- Slow: regex parsing
SELECT extractAllGroups(url, '\\?([^#]+)') FROM logs;

-- Fast: native function
SELECT queryString(url) FROM logs;

Native functions leverage ClickHouse's optimized C++ implementations and often include SIMD optimizations for string processing.

Streaming ingestion and on-the-fly parsing

Processing query strings during data ingestion rather than at query time reduces storage requirements and improves query performance. This approach works particularly well for high-throughput scenarios.

1. Kafka connector pattern

Extract query strings while ingesting from Kafka streams using materialized views:

CREATE TABLE kafka_raw (
    timestamp DateTime,
    raw_url String
) ENGINE = Kafka()
SETTINGS kafka_broker_list = 'localhost:9092',
         kafka_topic_list = 'web_logs';

CREATE MATERIALIZED VIEW parsed_urls_mv TO parsed_urls AS
SELECT 
    timestamp,
    raw_url,
    queryString(raw_url) AS query_params
FROM kafka_raw;

This pattern processes millions of URLs per second while maintaining low memory overhead. The parsing happens automatically as data flows through the system.

2. Materialized view pattern

Automatically parse URLs on insert using materialized views attached to your main table:

CREATE MATERIALIZED VIEW url_analytics_mv
ENGINE = AggregatingMergeTree()
ORDER BY (date, domain)
AS SELECT 
    toDate(timestamp) AS date,
    domain(url) AS domain,
    queryString(url) AS query_string,
    count() AS page_views
FROM web_logs
GROUP BY date, domain, query_string;

This approach enables real-time URL analytics without impacting insert performance.

Parsing individual parameters into key-value maps

Beyond extracting the complete query string, you can parse individual parameters into ClickHouse's native Map data type for structured analysis. This becomes especially useful when you want to analyze specific parameters across millions of URLs.

1. mapFromString for key value pairs

Convert query strings into Map types for easier parameter access:

SELECT 
    url,
    mapFromString(queryString(url), '&', '=') AS params_map
FROM urls_table;
-- Returns: {'utm_source': 'google', 'utm_medium': 'cpc'}

The mapFromString() function splits the query string on & for pairs and = for key-value separation. This creates a structured format for parameter analysis.

2. Accessing parameters with mapGet

Extract specific parameter values from the parsed map:

SELECT 
    url,
    mapGet(mapFromString(queryString(url), '&', '='), 'utm_source') AS traffic_source,
    mapGet(mapFromString(queryString(url), '&', '='), 'campaign_id') AS campaign
FROM marketing_data;

This pattern works particularly well for marketing attribution analysis where you want to isolate specific tracking parameters.

Building a real-time, ClickHouse-based URL analytics API with Tinybird

Let's create a complete example that ingests web traffic data and exposes an API endpoint returning query parameter analytics using queryString.

This walkthrough shows how URL query parsing integrates into real-world analytics workflows, from data ingestion through API deployment. You'll see exactly how the function works in practice.

Tinybird is a managed service that makes it easy to build real-time analytics APIs powered by ClickHouse. It handles the infrastructure, scaling, and API generation so you can focus on writing SQL and building analytics features.

In this example, we'll use Tinybird to:

Create a data source for ingesting web traffic events
Write a pipe that processes and aggregates query parameters using queryString
Deploy an API endpoint that serves query parameter analytics

The workflow demonstrates how to combine ClickHouse's powerful URL processing with Tinybird's API capabilities to build production-ready analytics services.

1. Create the data source

First, create a data source to store your web traffic events:

SCHEMA >
    `timestamp` DateTime64(3, 'UTC') `json:$.timestamp`,
    `url` String `json:$.url`,
    `user_id` UInt32 `json:$.user_id`,
    `session_id` String `json:$.session_id`,
    `page_title` String `json:$.page_title`,
    `referrer` String `json:$.referrer`

ENGINE MergeTree
ENGINE_SORTING_KEY timestamp
ENGINE_PARTITION_KEY toYYYYMM(timestamp)

Now build the data source locally:

tb dev

This builds the project on Tinybird Local (which you'll need to have running) and creates your data source. Then ingest some sample data using the Events API:

curl -X POST "http://localhost:7181/v0/events?name=web_traffic" \
  -H "Content-Type: application/json" \
  -d '{"timestamp": "2025-10-03 14:30:15.123", "url": "https://www.example.com/products?utm_source=google&utm_medium=cpc&campaign_id=123", "user_id": 1001, "session_id": "sess_abc123", "page_title": "Products Page", "referrer": "https://www.google.com/search"}
{"timestamp": "2025-10-03 14:30:16.456", "url": "https://blog.example.com/article?id=456&source=newsletter", "user_id": 1002, "session_id": "sess_def456", "page_title": "Blog Article", "referrer": "https://www.facebook.com"}
{"timestamp": "2025-10-03 14:30:17.789", "url": "https://www.github.com/tinybird?tab=repositories&sort=stars", "user_id": 1003, "session_id": "sess_ghi789", "page_title": "GitHub Profile", "referrer": "https://www.twitter.com"}'

2. Write the pipe query

Create a pipe that analyzes query parameters:

NODE traffic_events
SQL >
    SELECT
        timestamp,
        url,
        queryString(url) AS query_string,
        domain(url) AS domain,
        extractURLParameter(url, 'utm_source') AS utm_source,
        extractURLParameter(url, 'utm_medium') AS utm_medium,
        extractURLParameter(url, 'campaign_id') AS campaign_id,
        user_id,
        session_id,
        page_title
    FROM web_traffic
    WHERE timestamp >= now() - INTERVAL 7 DAY
      AND queryString(url) != ''

NODE query_analytics
SQL >
    %
    SELECT
        domain,
        utm_source,
        utm_medium,
        campaign_id,
        COUNT(*) AS total_events,
        COUNT(DISTINCT user_id) AS unique_users,
        COUNT(DISTINCT session_id) AS unique_sessions,
        COUNT(DISTINCT query_string) AS unique_query_strings,
        MIN(timestamp) AS first_seen,
        MAX(timestamp) AS last_seen
    FROM traffic_events
    WHERE 
        {\% if defined(domain) %}
            domain = {{ String(domain) }}
        {\% end %}
        {\% if defined(utm_source) %}
            AND utm_source = {{ String(utm_source) }}
        {\% end %}
        {\% if defined(date_from) %}
            AND timestamp >= {{ DateTime(date_from) }}
        {\% end %}
        {\% if defined(date_to) %}
            AND timestamp <= {{ DateTime(date_to) }}
        {\% end %}
    GROUP BY domain, utm_source, utm_medium, campaign_id
    ORDER BY total_events DESC
    LIMIT {{ Int32(limit, 100) }}

TYPE endpoint

The queryString function ensures consistent query parameter extraction regardless of URL complexity.

3. Deploy and call the endpoint

Deploy your pipe to create the API endpoint:

tb --cloud deploy

Then call your new endpoint:

curl -H "Authorization: Bearer <your_token>" \
  "https://api.tinybird.co/v0/pipes/query_analytics.json?domain=example.com&limit=5"

The response returns neatly aggregated data with query parameter insights:

{
  "meta": [
    {
      "name": "domain",
      "type": "String"
    },
    {
      "name": "utm_source",
      "type": "Nullable(String)"
    },
    {
      "name": "utm_medium",
      "type": "Nullable(String)"
    },
    {
      "name": "campaign_id",
      "type": "Nullable(String)"
    },
    {
      "name": "total_events",
      "type": "UInt64"
    },
    {
      "name": "unique_users",
      "type": "UInt64"
    },
    {
      "name": "unique_sessions",
      "type": "UInt64"
    },
    {
      "name": "unique_query_strings",
      "type": "UInt64"
    },
    {
      "name": "first_seen",
      "type": "DateTime"
    },
    {
      "name": "last_seen",
      "type": "DateTime"
    }
  ],
  "data": [
    {
      "domain": "example.com",
      "utm_source": "google",
      "utm_medium": "cpc",
      "campaign_id": "123",
      "total_events": 1247,
      "unique_users": 892,
      "unique_sessions": 1156,
      "unique_query_strings": 23,
      "first_seen": "2025-10-03 14:30:15",
      "last_seen": "2025-10-03 15:45:22"
    },
    {
      "domain": "example.com",
      "utm_source": null,
      "utm_medium": null,
      "campaign_id": null,
      "total_events": 456,
      "unique_users": 234,
      "unique_sessions": 345,
      "unique_query_strings": 12,
      "first_seen": "2025-10-03 14:30:16",
      "last_seen": "2025-10-03 15:30:18"
    }
  ],
  "rows": 2,
  "statistics": {
    "elapsed": 0.045,
    "rows_read": 2000,
    "bytes_read": 89000
  }
}

Next steps for fast URL analytics with ClickHouse

While ClickHouse's URL functions provide excellent performance for URL parsing, managing ClickHouse clusters at scale introduces operational complexity. Tinybird's managed ClickHouse platform significantly reduces infrastructure overhead and delivers ClickHouse-level performance in a managed environment.

The platform includes built-in API generation, real-time ingestion, and automatic scaling. You can deploy URL parsing pipelines in minutes rather than weeks.

Additional resources

Subscribe to SCHEMA > Evolution

We are Tinybird and we manage data for companies like Vercel and Canva. Plus, write a newsletter covering Data, AI and everything that matters in between. Join us.

Frequently asked questions about extracting ClickHouse query strings

Did queryString always strip the leading question mark?

Yes, the queryString() function has consistently returned clean query strings without the ? prefix across all ClickHouse versions since its introduction. This behavior remains unchanged in current versions, ensuring backward compatibility for existing queries.

What happens if the URL contains encoded "?" or "#" characters?

ClickHouse handles URL-encoded characters like %3F (encoded ?) and %23 (encoded #) as literal characters within the query string. The function only recognizes actual ? and # characters as delimiters, not their encoded equivalents.

Can I extract multiple query strings from a single text blob?

Yes, combine extractAllGroups() with regex patterns to find multiple URLs in text and extract their query strings:

SELECT arrayMap(x -> queryString(x), extractAll(text, 'https?://[^\\s]+')) AS all_query_strings
FROM text_data;

This approach works well for parsing log files or documents containing multiple URLs.

How do I remove the query string entirely from a URL column?

Use the cutQueryString() function to get clean URLs without parameters:

SELECT cutQueryString(url) AS clean_url FROM urls_table;
-- Converts: https://example.com/path?param=value
-- To: https://example.com/path

This function proves useful for URL normalization in analytics where you want to group pages regardless of their parameters.

Skip the infra work. Deploy your first ClickHouse project now

Product /

Company /

Resources /

Integrations /

Use Cases /

How to extract URL query strings without ? or # in ClickHouse