URL encoding in ClickHouse requires a different approach than you might expect from other databases or programming languages. Instead of a generic urlEncode
function, ClickHouse provides encodeURLFormComponent
, which follows web form standards and treats spaces differently than JavaScript's equivalent function.
This guide covers the syntax, behavior differences, performance considerations, and practical examples for safely encoding query parameters in your ClickHouse workflows.
What encodeURLFormComponent
does
ClickHouse doesn't have a built-in urlEncode
function like you might expect, but it does provide encodeURLFormComponent
to safely encode strings for URL query parameters. This function replaces characters that have special meaning in URLs with percent-encoded representations that browsers and servers can safely parse.
Without proper encoding, a query parameter like search=coffee & tea
would break your URL structure since the ampersand has special meaning in query strings. The function handles this by converting problematic characters into their safe equivalents.
You'll find this useful when building APIs or processing user input that gets embedded in URLs. Rather than manually replacing each special character, encodeURLFormComponent
handles the entire conversion process according to web standards.
Syntax and basic example
The function signature is straightforward and takes any UTF-8 string as input:
encodeURLFormComponent(x: String) → String
1. Minimal select statement
Here's how the function handles various special characters in a single query:
SELECT encodeURLFormComponent('email=a+b@example.com & tags=music/rock');
This returns: email%3Da%2Bb%40example.com+%26+tags%3Dmusic%2Frock
Notice the transformation pattern: equals signs become %3D
, plus signs become %2B
, at symbols become %40
, spaces become +
, ampersands become %26
, and forward slashes become %2F
.
2. Insert-select pattern for batch encoding
You can encode entire columns during data ingestion or transformation. This approach works well when processing large datasets:
INSERT INTO target_table
SELECT
id,
encodeURLFormComponent(raw_param) AS encoded_param
FROM source_table;
Alternatively, create a new table with encoded values using a single statement:
CREATE TABLE encoded_params AS
SELECT
id,
encodeURLFormComponent(param) AS param_encoded
FROM source_table;
How it differs from JavaScript encodeURIComponent
The differences between ClickHouse's encodeURLFormComponent
and JavaScript's encodeURIComponent
matter when you're migrating logic between client and server environments.
Space character handling
The most noticeable difference involves how each function treats spaces. ClickHouse follows the application/x-www-form-urlencoded standard, which uses plus signs for spaces. JavaScript uses percent-encoding for everything, including spaces.
Plus sign versus percent-20
This comparison shows the key difference:
-- ClickHouse
SELECT encodeURLFormComponent('a b+c'); -- Returns: 'a+b%2Bc'
// JavaScript
encodeURIComponent('a b+c'); // Returns: 'a%20b%2Bc'
In ClickHouse, spaces become +
while literal plus signs become %2B
. JavaScript percent-encodes both spaces and plus signs differently, using %20
for spaces and %2B
for plus signs.
Reserved character set
The functions also differ in their approach to "safe" characters:
- ClickHouse
encodeURLFormComponent
: Focuses on form-style query strings, encoding characters like=
,&
,+
, and most non-alphanumerics while treating spaces as+
- JavaScript
encodeURIComponent
: Encodes all characters exceptA-Z a-z 0-9 - _ . ! ~ * ' ( )
and uses%20
for spaces instead of+
If you need percent-20 encoding for spaces in ClickHouse, encodeURL
might be more appropriate than encodeURLFormComponent
.
Decoding with decodeURLComponent
Use decodeURLComponent
to reverse the encoding process. This function interprets +
as space and converts percent escapes back to their original UTF-8 characters.
Round-trip verification query
Test that your encoding and decoding work correctly:
SELECT
s,
encodeURLFormComponent(s) AS enc,
decodeURLComponent(enc) AS round_trip
FROM
(
SELECT 'a b+c & €' AS s
);
The round_trip
column returns the original string, confirming that the encode-decode process preserves your data integrity.
Handling arrays, UTF-8 and nulls
The function works with UTF-8 strings and integrates well with ClickHouse's array functions. However, you'll want to consider multibyte characters and null handling in your schemas.
UTF-8 multibyte characters
International characters and emojis get percent-encoded as their UTF-8 byte sequences:
SELECT encodeURLFormComponent('café ☕'); -- Returns: 'caf%C3%A9+%E2%98%95'
Each multibyte character becomes multiple percent-encoded bytes, which follows the correct URL encoding standards.
Null and empty string behavior
The function handles edge cases predictably:
NULL values: Stay NULL if your column is
Nullable(String)
. Wrap withcoalesce
if you prefer empty strings:SELECT encodeURLFormComponent(COALESCE(col, '')) FROM table_name;
Empty strings: Pass through unchanged as
''
Encoding an array column
Apply encoding to each array element using arrayMap
:
SELECT arrayMap(x -> encodeURLFormComponent(x), params) AS params_encoded
FROM events;
This pattern works well when you have multiple parameters stored as an array that all need encoding.
Avoiding double encoding in queries
Applying encoding multiple times corrupts your data by turning %
into %25
and breaking downstream consumers. The key is encoding values exactly once at the boundary where they enter URL contexts.
Detecting pre-encoded values
Use pattern matching to check if strings might already be encoded:
SELECT
s,
countRegexp(s, '%[0-9A-Fa-f]{2}') AS pct_escapes,
position(s, '+') AS has_plus
FROM sample_table;
High pct_escapes
counts or unexpected +
characters often indicate pre-encoded values.
Safe parameter ordering pattern
Follow these practices to prevent double encoding:
- Store raw data unencoded in your tables
- Encode at read-time when constructing URLs or API responses, or encode once at ingestion in a dedicated column
- Centralize encoding in a single view or materialized view to prevent accidental duplication
- Avoid encoding the same field both during ingestion and again in downstream queries
Performance considerations for large batches
Encoding is CPU-bound and scales with input length and the number of characters requiring escapes. The function performs efficiently for most workloads.
CPU cost benchmarks
Relative to other string functions, encodeURLFormComponent
performs similarly to replaceAll
or lowerUTF8
. It typically costs less than heavy regex operations and processes hundreds of MB/s per core, depending on your hardware and character distribution.
Monitor actual performance using system.query_log
to measure costs for your specific dataset and usage patterns.
Materialized view strategy
For frequently accessed data, consider pre-encoding during ingestion to shift CPU cost away from query time:
CREATE TABLE raw_events (
id UInt64,
param String,
timestamp DateTime
) ENGINE = MergeTree
ORDER BY (timestamp, id);
CREATE MATERIALIZED VIEW mv_events_encoded TO events_encoded AS
SELECT
id,
encodeURLFormComponent(param) AS param_encoded,
timestamp
FROM raw_events;
This approach keeps queries fast by doing encoding work once during ingestion rather than repeatedly during queries.
Building ClickHouse-based web analytics APIs with Tinybird
Let's create a complete example that demonstrates how to build a real-time API for URL parameter encoding using Tinybird's managed ClickHouse platform.
This walkthrough shows how to ingest web analytics data, process URL parameters with ClickHouse functions, and expose encoded parameter analytics through a real-time API endpoint.
1. Create the data source
First, create a data source to store web analytics events:
SCHEMA >
`timestamp` DateTime64(3) `json:$.timestamp`,
`user_id` String `json:$.user_id`,
`page_url` String `json:$.page_url`,
`search_query` String `json:$.search_query`,
`utm_source` String `json:$.utm_source`,
`utm_medium` String `json:$.utm_medium`
ENGINE MergeTree
ENGINE_SORTING_KEY timestamp
ENGINE_PARTITION_KEY toYYYYMM(timestamp)
Now build the project locally:
tb dev
This builds the project on Tinybird Local and creates your data source. Then ingest some sample data using the Events API:
curl -X POST "http://localhost:7181/v0/events?name=web_analytics" \
-H "Content-Type: application/json" \
-d '{"timestamp": "2025-01-15 14:30:15.123", "user_id": "user123", "page_url": "https://shop.example.com/search?q=coffee & tea", "search_query": "coffee & tea", "utm_source": "google", "utm_medium": "cpc"}
{"timestamp": "2025-01-15 14:30:16.456", "user_id": "user456", "page_url": "https://shop.example.com/products?category=electronics&brand=apple", "search_query": "", "utm_source": "facebook", "utm_medium": "social"}
{"timestamp": "2025-01-15 14:30:17.789", "user_id": "user789", "page_url": "https://shop.example.com/about", "search_query": "", "utm_source": "", "utm_medium": ""}'
2. Create the pipe files
Create a pipe that processes URL data and provides analytics:
NODE web_events
SQL >
SELECT
timestamp,
user_id,
page_url,
search_query,
utm_source,
utm_medium,
encodeURLFormComponent(search_query) AS encoded_search,
encodeURLFormComponent(utm_source) AS encoded_utm_source,
encodeURLFormComponent(utm_medium) AS encoded_utm_medium
FROM web_analytics
WHERE timestamp >= now64(3) - INTERVAL 1 HOUR
NODE url_analytics
SQL >
%
SELECT
encoded_search,
encoded_utm_source,
encoded_utm_medium,
COUNT(*) AS search_count,
COUNT(DISTINCT user_id) AS unique_users,
MIN(timestamp) AS first_seen,
MAX(timestamp) AS last_seen
FROM web_events
WHERE
{\% if defined(utm_source) %}
encoded_utm_source = {{ String(utm_source) }}
{\% end %}
{\% if defined(hours_back) %}
AND timestamp >= now64(3) - INTERVAL {{ Int32(hours_back) }} HOUR
{\% end %}
GROUP BY encoded_search, encoded_utm_source, encoded_utm_medium
ORDER BY search_count DESC
LIMIT {{ Int32(limit, 100) }}
TYPE endpoint
3. Deploy and test the API
Deploy your project to Tinybird Cloud:
tb --cloud deploy
Then test your API endpoint:
# Get all search analytics from the last hour
curl -H "Authorization: Bearer <your_token>" \
"https://api.tinybird.co/v0/pipes/url_analytics.json"
# Filter by UTM source
curl -H "Authorization: Bearer <your_token>" \
"https://api.tinybird.co/v0/pipes/url_analytics.json?utm_source=google"
# Filter by time range
curl -H "Authorization: Bearer <your_token>" \
"https://api.tinybird.co/v0/pipes/url_analytics.json?hours_back=2"
4. Sample API response
The API returns structured data with encoded parameter analytics:
{
"meta": [
{
"name": "encoded_search",
"type": "String"
},
{
"name": "encoded_utm_source",
"type": "String"
},
{
"name": "encoded_utm_medium",
"type": "String"
},
{
"name": "search_count",
"type": "UInt64"
},
{
"name": "unique_users",
"type": "UInt64"
},
{
"name": "first_seen",
"type": "DateTime64(3)"
},
{
"name": "last_seen",
"type": "DateTime64(3)"
}
],
"data": [
{
"encoded_search": "coffee+%26+tea",
"encoded_utm_source": "google",
"encoded_utm_medium": "cpc",
"search_count": 1247,
"unique_users": 892,
"first_seen": "2025-01-15 14:30:15.123",
"last_seen": "2025-01-15 15:45:22.789"
},
{
"encoded_search": "",
"encoded_utm_source": "facebook",
"encoded_utm_medium": "social",
"search_count": 456,
"unique_users": 234,
"first_seen": "2025-01-15 14:30:16.456",
"last_seen": "2025-01-15 15:30:18.345"
}
],
"rows": 2,
"statistics": {
"elapsed": 0.045,
"rows_read": 2000,
"bytes_read": 89000
}
}
Related ClickHouse URL functions
ClickHouse includes several URL utilities that complement encodeURLFormComponent
for different parts of URL handling workflows.
encodeURL
for full URLs
Use encodeURL
when you need percent-encoding for entire URLs or specifically want spaces as %20
rather than +
. This function suits path components and non-form contexts where the plus-as-space convention doesn't apply.
extractURLParameter
for retrieval
Extract parameter values from complete URLs using extractURLParameter(url, name)
. Combine it with decodeURLComponent
to properly interpret plus-as-space and percent escapes:
SELECT decodeURLComponent(extractURLParameter(url, 'q')) AS search_term
FROM page_views;
This pattern lets you safely extract and decode URL parameters that were encoded with encodeURLFormComponent
.
Next steps for web analytics APIs with ClickHouse
Managing ClickHouse infrastructure for URL parameter encoding involves complexity—from optimizing storage and query performance to handling encoding edge cases and maintaining consistent data quality across distributed systems.
Tinybird's managed ClickHouse service reduces infrastructure overhead by providing a managed ClickHouse environment that abstracts optimization, scaling, and maintenance. Tinybird exposes ClickHouse features such as encodeURLFormComponent()
and URL parsing without requiring direct infrastructure management, including database clusters, performance monitoring, or version upgrades.
This enables teams to focus on application development rather than database operations or configuring ClickHouse for production workloads. Tinybird's platform supports developer efficiency through managed scaling and workflow integrations.
Sign up for a free Tinybird account to build and test your first ClickHouse-based API in just a few minutes.
Additional resources
- encodeURLFormComponent function
- decodeURLComponent function
- encodeURL function
- extractURLParameter function
- URL functions overview
- How to decode URL-encoded strings in ClickHouse
- Extract query parameter in ClickHouse
- Extract URL fragment in ClickHouse
FAQs about URL-encoding in ClickHouse
Can I run encodeURLFormComponent
inside an INSERT statement?
Yes, you can use it directly in INSERT ... SELECT
statements to encode during ingestion, or in a materialized view to automatically populate an encoded column as new data arrives.
Which ClickHouse version added encodeURLFormComponent
?
The function was introduced in ClickHouse 21.4 and is available in all recent stable versions, including the managed ClickHouse provided by Tinybird.