URL analysis often stops at counting page views per path, missing the bigger picture of how users navigate through your site's structure. When you treat /blog/tech/databases/clickhouse-guide
as completely separate from /blog/tech/databases
or /blog/tech
, you lose insights about traffic patterns at different levels of your content hierarchy.
ClickHouse's URLHierarchy function solves this by breaking any URL into progressively shorter versions, truncating at natural boundaries like slashes and query parameters to reveal navigation patterns across your entire site structure. This guide covers everything from basic syntax through production deployment, including performance optimization techniques and real-world implementation patterns for building URL hierarchy analytics that scale.
What URLHierarchy does and why you need it
URLHierarchy takes any URL and breaks it down into progressively shorter versions, cutting at natural boundaries like forward slashes and question marks. When you feed it https://example.com/blog/category/post-title?param=value
, you get back an array starting with the full URL, then one without the query string, then truncated at each path segment moving backward.
This becomes incredibly useful when you're analyzing how people navigate through your website. Instead of treating each URL as a separate entity, you can see traffic patterns at different levels of your site structure—from broad categories down to specific pages.
Basic syntax and return format
The function signature is straightforward: URLHierarchy(url)
takes a string and returns an Array(String). Each element in that array represents one level of the URL hierarchy, with the longest version first.
Here's what happens when you run it:
SELECT URLHierarchy('https://shop.example.com/electronics/laptops/gaming?brand=dell')
You'll get an array with these elements:
https://shop.example.com/electronics/laptops/gaming?brand=dell
https://shop.example.com/electronics/laptops/gaming
https://shop.example.com/electronics/laptops/
https://shop.example.com/electronics/
https://shop.example.com/
Parameters and types
URLHierarchy accepts one String parameter containing the URL to process. The return type is Array(String), where each array element represents a truncation point in the original URL.
The function has been available since ClickHouse 19.4, though some edge case handling improved in later versions. If you're running an older version, you might encounter inconsistencies with complex URL formats.
Sample return array for a deep URL
Let's look at a more complex example to see how the function handles different URL components:
SELECT URLHierarchy('https://docs.example.com/api/v2/users/123/settings?tab=security&view=advanced#section-2')
The returned array contains six elements, each representing a natural stopping point in the URL structure. Notice how the function first removes the fragment (#section-2
), then the query string, then works backward through each path segment.
Step-by-step: Build a URL hierarchy query
Building useful analytics from URLHierarchy requires combining it with array manipulation functions. The process involves generating the hierarchy array, then exploding it into separate rows for analysis.
1. Select raw URLs
Start with your source table containing URL data. Most web analytics scenarios involve pageview or session tables:
SELECT url, user_id, timestamp
FROM pageviews
WHERE date = today()
LIMIT 10
Your URL column typically contains complete request URLs with paths, query parameters, and sometimes fragments that you want to analyze at different hierarchy levels.
2. Apply URLHierarchy
Add the URLHierarchy function to generate hierarchy arrays for each URL:
SELECT
url,
URLHierarchy(url) AS url_levels
FROM pageviews
WHERE date = today()
LIMIT 10
At this point, each row contains the original URL plus an array of all its hierarchy levels. The array format isn't immediately useful for aggregation—you need to convert it into separate rows.
3. Explode with arrayJoin
Use arrayJoin to flatten the hierarchy array, creating one row for each level:
SELECT
original_url,
hierarchy_level
FROM (
SELECT
url AS original_url,
URLHierarchy(url) AS url_levels
FROM pageviews
WHERE date = today()
)
ARRAY JOIN url_levels AS hierarchy_level
Now each original URL generates multiple rows—one for each level in its path hierarchy. This transformation enables you to aggregate traffic at each path depth.
4. Aggregate metrics by depth
Group by hierarchy level to count visits and analyze traffic patterns:
SELECT
hierarchy_level,
COUNT(*) AS page_views,
COUNT(DISTINCT user_id) AS unique_visitors
FROM (
SELECT
url,
user_id,
URLHierarchy(url) AS url_levels
FROM pageviews
WHERE date >= today() - 7
)
ARRAY JOIN url_levels AS hierarchy_level
GROUP BY hierarchy_level
ORDER BY page_views DESC
LIMIT 20
This query reveals which sections of your site receive the most traffic when you consider the hierarchical nature of URL paths.
Edge cases and path boundary rules
URLHierarchy handles various URL formats, but understanding its behavior with edge cases helps you write more robust queries and interpret results correctly.
Handling query strings
URLHierarchy always truncates at the ?
character that separates the path from query parameters. When your URL contains search parameters or filters, the function includes both the full URL with parameters and a version with parameters removed.
This behavior becomes important when analyzing search pages or filtered product listings—you might want to group URLs by their base path regardless of the specific parameters users applied.
Double or trailing slashes
Consecutive forward slashes and trailing slashes create unexpected hierarchy levels. URLHierarchy treats https://example.com//blog/
differently from https://example.com/blog/
, potentially creating empty or redundant hierarchy levels in your results.
You can normalize URLs before processing with replaceRegexpAll(url, '/+', '/')
to collapse multiple slashes, though this adds complexity to your queries.
Non-HTTP schemes
URLHierarchy works with non-HTTP schemes like ftp://
, file://
, or custom application schemes. However, the hierarchy logic remains focused on slash-separated paths, which might not match the conventions of specialized schemes.
For custom schemes, you might need to combine URLHierarchy with string manipulation functions to achieve the hierarchy structure your analysis requires.
Performance tips for large-scale web data
Processing millions of URLs with URLHierarchy can consume significant memory and processing resources, especially when arrayJoin creates multiple output rows for each input URL.
Memory-safe array joins
Large hierarchy arrays can cause memory pressure when arrayJoin explodes them into separate rows. Consider batching your queries by date ranges or limiting the number of URLs processed at once:
SELECT hierarchy_level, COUNT(*) as visits
FROM (
SELECT URLHierarchy(url) AS url_levels
FROM pageviews
WHERE date = today() - 1 -- Process one day at a time
AND user_id % 100 = 0 -- Sample 1% of users
)
ARRAY JOIN url_levels AS hierarchy_level
GROUP BY hierarchy_level
This approach prevents memory overflow while still allowing you to process large datasets effectively.
LowCardinality dictionaries
Many websites follow predictable URL patterns where the same path prefixes appear frequently. Declare your URL column as LowCardinality(String)
to improve compression and query performance:
CREATE TABLE pageviews (
url LowCardinality(String),
user_id String,
timestamp DateTime
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
LowCardinality encoding works particularly well when your URLs follow consistent patterns with repeated domain names and common path structures.
Using projections or indexes
For frequently-accessed hierarchy analysis, consider creating a projection that pre-computes URLHierarchy results:
ALTER TABLE pageviews
ADD PROJECTION url_hierarchy_proj (
SELECT
hierarchy_level,
COUNT() as visits
FROM (
SELECT URLHierarchy(url) AS url_levels
FROM pageviews
)
ARRAY JOIN url_levels AS hierarchy_level
GROUP BY hierarchy_level
)
Projections store pre-computed results that ClickHouse automatically uses when your queries match the projection structure. For more details on using projections for query optimization, this approach can significantly improve performance for complex URL hierarchy analysis.
Persisting hierarchies with a materialized view
When you're repeatedly analyzing URL hierarchies, creating a materialized view eliminates the need to recompute URLHierarchy results for every query. For a comprehensive guide on what materialized views are and why they matter for real-time analytics, this approach can significantly improve performance for complex URL analysis.
View definition
Create a materialized view that processes incoming URL data and stores the exploded hierarchy:
CREATE MATERIALIZED VIEW url_hierarchy_stats
ENGINE = SummingMergeTree()
ORDER BY (hierarchy_level, date)
AS SELECT
hierarchy_level,
toDate(timestamp) as date,
COUNT() as visits,
uniq(user_id) as unique_visitors
FROM (
SELECT
timestamp,
user_id,
URLHierarchy(url) AS url_levels
FROM pageviews
)
ARRAY JOIN url_levels AS hierarchy_level
GROUP BY hierarchy_level, toDate(timestamp)
This materialized view automatically processes new data as it arrives, maintaining pre-computed hierarchy statistics that you can query directly without reprocessing raw URLs.
Refresh and reparse strategy
Materialized views in ClickHouse only process new inserts—they don't automatically handle changes to existing data. If you need to reprocess historical data or handle URL structure changes, you'll need to recreate the view:
DROP VIEW url_hierarchy_stats;
-- Recreate with updated logic
CREATE MATERIALIZED VIEW url_hierarchy_stats ...
For production systems, consider implementing a periodic refresh strategy or using incremental processing to handle data updates without full rebuilds.
Related ClickHouse URL functions and when to use them
URLHierarchy works alongside other ClickHouse URL functions, each optimized for different analytical scenarios. Understanding when to use each function helps you choose the right tool for your specific use case.
parseURL
The parseURL function breaks URLs into structured components when you need specific parts rather than hierarchical analysis:
SELECT
parseURL(url).host as domain,
parseURL(url).path as path,
parseURL(url).query as query_string
FROM pageviews
Use parseURL when building domain-level reports or extracting specific URL components for filtering and grouping operations.
cutURLPath
cutURLPath removes the path portion entirely, leaving only the protocol and domain:
SELECT
cutURLPath(url) as base_domain,
COUNT() as total_visits
FROM pageviews
GROUP BY base_domain
Choose cutURLPath over URLHierarchy when your analysis focuses on cross-domain traffic or referrer analysis where path details aren't relevant.
domain
The domain function extracts just the hostname without protocol, making it ideal for grouping by site:
SELECT
domain(referrer) as referrer_domain,
COUNT() as referrals
FROM pageviews
WHERE referrer != ''
GROUP BY referrer_domain
ORDER BY referrals DESC
Domain extraction works particularly well for referrer analysis and understanding traffic sources.
topLevelDomain
topLevelDomain returns only the TLD (.com, .org, .uk), which helps with geographic or organizational analysis:
SELECT
topLevelDomain(url) as tld,
COUNT() as visits
FROM pageviews
GROUP BY tld
ORDER BY visits DESC
This function works well for understanding the geographic distribution of your traffic or analyzing different types of organizations visiting your site.
Troubleshooting common errors
URLHierarchy can produce unexpected results when working with malformed URLs or edge cases in real-world data. Here are the most common issues and how to handle them.
NULL or empty returns
Malformed URLs or empty strings cause URLHierarchy to return empty arrays or NULL values. Handle these cases defensively:
SELECT
url,
CASE
WHEN length(URLHierarchy(url)) > 0
THEN URLHierarchy(url)
ELSE ['invalid_url']
END as hierarchy_array
FROM pageviews
WHERE url IS NOT NULL AND url != ''
This pattern ensures your arrayJoin operations don't lose rows due to empty hierarchy arrays.
Version compatibility issues
URLHierarchy behavior has evolved across ClickHouse versions, particularly around handling IPv6 addresses and international domain names. Version 21.3 and later include improvements for complex URL formats.
If you're seeing inconsistent results after upgrading ClickHouse, test your URLHierarchy queries against a sample of your production data to identify behavioral changes.
Unexpected array lengths
URLs with unusual structures can produce hierarchy arrays with more or fewer elements than expected. Debug unexpected array lengths by examining the raw URLHierarchy output:
SELECT
url,
URLHierarchy(url) as hierarchy,
length(URLHierarchy(url)) as hierarchy_depth
FROM pageviews
WHERE length(URLHierarchy(url)) > 10 -- Find unusually deep hierarchies
LIMIT 10
This query helps identify URLs that create unexpectedly complex hierarchies, often due to malformed paths or unusual URL structures.
Building a ClickHouse-based "content drilldown" API with Tinybird
A common URL hierarchy analytics use case is building a content drilldown report to show how users navigate through your site's content hierarchy, from broad subfolders down to specific pages. It reveals which sections drive the most traffic and where users drop off in your content funnel. Building this functionality requires aggregating page views across different levels of your URL hierarchy, exactly what ClickHouse's URLHierarchy function enables. For more advanced analytics scenarios including multitenancy and AI-powered insights, URL hierarchy analysis provides the foundation for understanding user behavior patterns across different customer segments.
Here's how to build a complete content drilldown analytics API that replicates and extends Google Analytics' content reporting capabilities using Tinybird's managed ClickHouse service.
Step 1: Set up content analytics data source
Create a data source optimized for content drilldown analysis. This captures page views with the URL structure needed for hierarchical content reporting:
SCHEMA >
`timestamp` DateTime64(3, 'UTC') `json:$.timestamp`,
`session_id` String `json:$.session_id`,
`user_id` String `json:$.user_id`,
`page_url` String `json:$.page_url`,
`page_title` String `json:$.page_title`,
`content_category` String `json:$.content_category`,
`content_type` String `json:$.content_type`,
`referrer_url` String `json:$.referrer_url`,
`user_agent` String `json:$.user_agent`,
`country` LowCardinality(String) `json:$.country`,
`device_type` LowCardinality(String) `json:$.device_type`,
`is_bounce` UInt8 `json:$.is_bounce`,
`time_on_page` UInt32 `json:$.time_on_page`
ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp, session_id"
Step 2: Ingest sample content data
Start the local Tinybird development server:
tb dev
Ingest sample content analytics data that represents a typical content hierarchy:
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TINYBIRD_TOKEN" \
"http://localhost:7181/v0/events?name=web_analytics" \
-d '[
{
"timestamp": "2024-12-01 10:30:00",
"session_id": "sess_001",
"user_id": "user_123",
"page_url": "https://example.com/blog/tech/databases/clickhouse-guide",
"page_title": "Complete ClickHouse Guide for Analytics",
"content_category": "Technology",
"content_type": "Tutorial",
"referrer_url": "https://google.com/search",
"user_agent": "Mozilla/5.0 Chrome/91.0",
"country": "US",
"device_type": "desktop",
"is_bounce": 0,
"time_on_page": 180
},
{
"timestamp": "2024-12-01 11:15:00",
"session_id": "sess_002",
"user_id": "user_456",
"page_url": "https://example.com/blog/tech/databases",
"page_title": "Database Articles",
"content_category": "Technology",
"content_type": "Category",
"referrer_url": "https://example.com/blog/tech",
"user_agent": "Mozilla/5.0 Safari/14.0",
"country": "UK",
"device_type": "mobile",
"is_bounce": 0,
"time_on_page": 45
},
{
"timestamp": "2024-12-01 12:45:00",
"session_id": "sess_003",
"user_id": "user_789",
"page_url": "https://example.com/blog/tech",
"page_title": "Technology Blog",
"content_category": "Technology",
"content_type": "Section",
"referrer_url": "https://example.com/",
"user_agent": "Mozilla/5.0 Firefox/89.0",
"country": "CA",
"device_type": "desktop",
"is_bounce": 0,
"time_on_page": 120
},
{
"timestamp": "2024-12-01 14:20:00",
"session_id": "sess_004",
"user_id": "user_012",
"page_url": "https://example.com/blog",
"page_title": "Blog Homepage",
"content_category": "Blog",
"content_type": "Landing",
"referrer_url": "https://example.com/",
"user_agent": "Mozilla/5.0 Chrome/91.0",
"country": "US",
"device_type": "desktop",
"is_bounce": 1,
"time_on_page": 15
}
]'
Step 3: Build content drilldown APIs
We can now create APIs that replicate Google Analytics' content drilldown functionality. These show traffic patterns across your content hierarchy from broad categories to specific pages.
Create content_drilldown.pipe
for the main content drilldown report:
NODE content_drilldown
SQL >
%
WITH content_hierarchy AS (
SELECT
timestamp,
session_id,
user_id,
page_url,
page_title,
content_category,
content_type,
country,
device_type,
is_bounce,
time_on_page,
URLHierarchy(page_url) AS url_levels
FROM web_analytics
WHERE timestamp >= {{DateTime(start_date, '2024-12-01 00:00:00')}}
AND timestamp <= {{DateTime(end_date, '2024-12-02 00:00:00')}}
{\% if defined(domain_filter) %}
AND domain(page_url) = {{String(domain_filter)}}
{\% end %}
{\% if defined(category_filter) %}
AND content_category = {{String(category_filter)}}
{\% end %}
{\% if defined(country_filter) %}
AND country = {{String(country_filter)}}
{\% end %}
)
SELECT
hierarchy_level,
path(hierarchy_level) AS content_path,
page_title,
content_category,
content_type,
count() AS page_views,
uniq(session_id) AS unique_sessions,
uniq(user_id) AS unique_users,
countIf(is_bounce = 1) AS bounces,
round(countIf(is_bounce = 1) * 100.0 / count(), 2) AS bounce_rate_percent,
round(avg(time_on_page), 2) AS avg_time_on_page,
-- Calculate hierarchy depth for drilldown navigation
length(splitByChar('/', path(hierarchy_level))) - 1 AS hierarchy_depth,
-- Content performance metrics
round(count() * 100.0 / sum(count()) OVER (), 2) AS traffic_percentage,
-- Top referrers for this content level
topK(3)(referrer_url) AS top_referrers
FROM content_hierarchy
ARRAY JOIN url_levels AS hierarchy_level
WHERE hierarchy_level != ''
AND hierarchy_level IS NOT NULL
{\% if defined(min_depth) %}
AND length(splitByChar('/', path(hierarchy_level))) - 1 >= {{Int32(min_depth, 0)}}
{\% end %}
{\% if defined(max_depth) %}
AND length(splitByChar('/', path(hierarchy_level))) - 1 <= {{Int32(max_depth, 10)}}
{\% end %}
GROUP BY hierarchy_level, page_title, content_category, content_type, hierarchy_depth
ORDER BY page_views DESC
{\% if defined(limit) %}
LIMIT {{Int32(limit, 100)}}
{\% end %}
TYPE ENDPOINT
Create a content funnel analysis API. Create content_funnel_analysis.pipe
:
NODE content_funnel_analysis
SQL >
%
WITH content_hierarchy AS (
SELECT
timestamp,
session_id,
user_id,
page_url,
page_title,
content_category,
content_type,
is_bounce,
time_on_page,
URLHierarchy(page_url) AS url_levels
FROM web_analytics
WHERE timestamp >= {{DateTime(start_date, '2024-12-01 00:00:00')}}
AND timestamp <= {{DateTime(end_date, '2024-12-02 00:00:00')}}
{\% if defined(domain_filter) %}
AND domain(page_url) = {{String(domain_filter)}}
{\% end %}
{\% if defined(category_filter) %}
AND content_category = {{String(category_filter)}}
{\% end %}
),
funnel_metrics AS (
SELECT
hierarchy_level,
path(hierarchy_level) AS content_path,
page_title,
content_category,
content_type,
count() AS page_views,
uniq(session_id) AS unique_sessions,
countIf(is_bounce = 1) AS bounces,
round(avg(time_on_page), 2) AS avg_time_on_page,
-- Calculate content funnel level
length(splitByChar('/', path(hierarchy_level))) - 1 AS funnel_depth,
CASE
WHEN funnel_depth <= 1 THEN 'Landing Pages'
WHEN funnel_depth <= 2 THEN 'Category Pages'
WHEN funnel_depth <= 3 THEN 'Subcategory Pages'
ELSE 'Content Pages'
END AS funnel_level
FROM content_hierarchy
ARRAY JOIN url_levels AS hierarchy_level
WHERE hierarchy_level != '' AND hierarchy_level IS NOT NULL
GROUP BY hierarchy_level, content_path, page_title, content_category, content_type, funnel_depth, funnel_level
)
SELECT
funnel_level,
funnel_depth,
count() AS unique_content_pages,
sum(page_views) AS total_page_views,
sum(unique_sessions) AS total_sessions,
sum(bounces) AS total_bounces,
round(sum(bounces) * 100.0 / sum(page_views), 2) AS funnel_bounce_rate,
round(sum(page_views) / sum(unique_sessions), 2) AS avg_pages_per_session,
round(avg(avg_time_on_page), 2) AS avg_time_on_page,
-- Content performance by funnel level
round(sum(page_views) * 100.0 / sum(sum(page_views)) OVER (), 2) AS traffic_share_percent,
-- Top performing content at each funnel level
topK(5)(tuple(content_path, page_title, page_views)) AS top_content
FROM funnel_metrics
{\% if defined(funnel_level_filter) %}
WHERE funnel_level = {{String(funnel_level_filter)}}
{\% end %}
GROUP BY funnel_level, funnel_depth
ORDER BY funnel_depth ASC, total_page_views DESC
{\% if defined(limit) %}
LIMIT {{Int32(limit, 20)}}
{\% end %}
TYPE ENDPOINT
Step 4: Test the content drilldown APIs
Test your content drilldown analytics APIs locally:
# Test main content drilldown report
curl "http://localhost:7181/v0/pipes/content_drilldown.json?start_date=2024-12-01%2000:00:00&end_date=2024-12-02%2000:00:00"
# Test content drilldown with category filtering
curl "http://localhost:7181/v0/pipes/content_drilldown.json?category_filter=Technology&min_depth=1&max_depth=3"
# Test content funnel analysis
curl "http://localhost:7181/v0/pipes/content_funnel_analysis.json?start_date=2024-12-01%2000:00:00&end_date=2024-12-02%2000:00:00"
# Test funnel analysis for specific content level
curl "http://localhost:7181/v0/pipes/content_funnel_analysis.json?funnel_level_filter=Category%20Pages"
You should see content hierarchy analysis in action, where URLs are broken down into their component levels with Google Analytics-style metrics for each content hierarchy depth.
Step 5: Deploy to production
Once you've tested your content drilldown analytics locally and verified the results, deploy to your Tinybird cloud workspace:
tb --cloud deploy
Your content drilldown APIs are now available in production:
# Production content drilldown API calls
curl "$TINYBIRD_HOST/v0/pipes/content_drilldown.json?domain_filter=yoursite.com&category_filter=Technology&min_depth=1" \
-H "Authorization: Bearer $TINYBIRD_TOKEN"
curl "$TINYBIRD_HOST/v0/pipes/content_funnel_analysis.json?start_date=2024-12-01&end_date=2024-12-07" \
-H "Authorization: Bearer $TINYBIRD_TOKEN"
The APIs return comprehensive content drilldown analytics:
{
"meta": [
{"name": "hierarchy_level", "type": "String"},
{"name": "content_path", "type": "String"},
{"name": "page_title", "type": "String"},
{"name": "content_category", "type": "String"},
{"name": "content_type", "type": "String"},
{"name": "page_views", "type": "UInt64"},
{"name": "unique_sessions", "type": "UInt64"},
{"name": "unique_users", "type": "UInt64"},
{"name": "bounces", "type": "UInt64"},
{"name": "bounce_rate_percent", "type": "Float64"},
{"name": "avg_time_on_page", "type": "Float64"},
{"name": "hierarchy_depth", "type": "UInt64"},
{"name": "traffic_percentage", "type": "Float64"},
{"name": "top_referrers", "type": "Array(String)"}
],
"data": [
{
"hierarchy_level": "https://example.com/blog/tech/databases/clickhouse-guide",
"content_path": "/blog/tech/databases/clickhouse-guide",
"page_title": "Complete ClickHouse Guide for Analytics",
"content_category": "Technology",
"content_type": "Tutorial",
"page_views": 1,
"unique_sessions": 1,
"unique_users": 1,
"bounces": 0,
"bounce_rate_percent": 0.0,
"avg_time_on_page": 180.0,
"hierarchy_depth": 4,
"traffic_percentage": 25.0,
"top_referrers": ["https://google.com/search"]
},
{
"hierarchy_level": "https://example.com/blog/tech/databases",
"content_path": "/blog/tech/databases",
"page_title": "Database Articles",
"content_category": "Technology",
"content_type": "Category",
"page_views": 2,
"unique_sessions": 2,
"unique_users": 2,
"bounces": 0,
"bounce_rate_percent": 0.0,
"avg_time_on_page": 45.0,
"hierarchy_depth": 3,
"traffic_percentage": 50.0,
"top_referrers": ["https://example.com/blog/tech"]
}
],
"rows": 2,
"statistics": {
"elapsed": 0.008,
"rows_read": 4,
"bytes_read": 1024
}
}
This approach provides:
- Google Analytics-style content drilldown using
URLHierarchy()
function for progressive content analysis - Content hierarchy metrics including page views, sessions, users, bounce rates, and time on page at each content level
- Content funnel analysis to understand user navigation patterns through your content structure
- Content performance comparison across different hierarchy levels (landing pages, categories, subcategories, content pages)
- Flexible filtering by content category, domain, country, device type, and content depth ranges
- Real-time content insights for content strategy, user engagement optimization, and content performance analysis
Your content drilldown data becomes available as production-ready API endpoints that automatically handle complex content hierarchy analysis while providing actionable insights for content teams and marketers. Start building with Tinybird's free plan to create your own content drilldown analytics APIs.
Next steps to ship URL analytics faster
Building URL hierarchy analysis in production involves more than writing URLHierarchy queries—you need reliable data ingestion, query optimization, API deployment, and monitoring infrastructure.
Tinybird manages the operational aspects of ClickHouse at scale, enabling use of advanced analytical functions like URLHierarchy. This allows teams to deploy URL analysis APIs without the need to manage underlying database infrastructure.
The platform handles high-throughput data ingestion, automatic query optimization, API authentication and rate limiting, and monitoring—all the operational pieces needed to turn your URLHierarchy queries into production-ready analytics services.
You can sign up to Tinybird for free to build and test your first URL hierarchy analytics API in a few minutes.
Additional resources
- Functions for Working with URLs
- String Functions
- Array Functions
- Materialized Views
- Tinybird Web Analytics Starter Kit
- Real-time analytics with billion rows at scale
- Tinybird vs. ClickHouse: What's the difference?
FAQs about URLHierarchy in ClickHouse
Can I limit the maximum path depth returned by URLHierarchy?
URLHierarchy returns all path levels by default, but you can use array slicing with arraySlice()
to limit depth or combine with arrayFilter()
for conditional truncation.
Does URLHierarchy support IPv6 addresses and internationalized domain names?
URLHierarchy handles IPv6 addresses in brackets and punycode-encoded international domains, but complex Unicode URLs may need preprocessing with decodeURLComponent()
.
How do I extract only path segments without the domain from URLHierarchy results?
Use arrayMap()
with path()
function on URLHierarchy output, or combine with cutToFirstSignificantSubdomain()
to remove domain portions from each hierarchy level.