Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

URL analysis often stops at counting page views per path, missing the bigger picture of how users navigate through your site's structure. When you treat /blog/tech/databases/clickhouse-guide as completely separate from /blog/tech/databases or /blog/tech, you lose insights about traffic patterns at different levels of your content hierarchy.

ClickHouse's URLHierarchy function solves this by breaking any URL into progressively shorter versions, truncating at natural boundaries like slashes and query parameters to reveal navigation patterns across your entire site structure. This guide covers everything from basic syntax through production deployment, including performance optimization techniques and real-world implementation patterns for building URL hierarchy analytics that scale.

What URLHierarchy does and why you need it

URLHierarchy takes any URL and breaks it down into progressively shorter versions, cutting at natural boundaries like forward slashes and question marks. When you feed it https://example.com/blog/category/post-title?param=value, you get back an array starting with the full URL, then one without the query string, then truncated at each path segment moving backward.

This becomes incredibly useful when you're analyzing how people navigate through your website. Instead of treating each URL as a separate entity, you can see traffic patterns at different levels of your site structure—from broad categories down to specific pages.

Start building with Tinybird!

If you've read this far, you might want to use Tinybird as your analytics backend. You can just get started, on the free plan.

Basic syntax and return format

The function signature is straightforward: URLHierarchy(url) takes a string and returns an Array(String). Each element in that array represents one level of the URL hierarchy, with the longest version first.

Here's what happens when you run it:

SELECT URLHierarchy('https://shop.example.com/electronics/laptops/gaming?brand=dell')

You'll get an array with these elements:

https://shop.example.com/electronics/laptops/gaming?brand=dell
https://shop.example.com/electronics/laptops/gaming
https://shop.example.com/electronics/laptops/
https://shop.example.com/electronics/
https://shop.example.com/

Parameters and types

URLHierarchy accepts one String parameter containing the URL to process. The return type is Array(String), where each array element represents a truncation point in the original URL.

The function has been available since ClickHouse 19.4, though some edge case handling improved in later versions. If you're running an older version, you might encounter inconsistencies with complex URL formats.

Sample return array for a deep URL

Let's look at a more complex example to see how the function handles different URL components:

SELECT URLHierarchy('https://docs.example.com/api/v2/users/123/settings?tab=security&view=advanced#section-2')

The returned array contains six elements, each representing a natural stopping point in the URL structure. Notice how the function first removes the fragment (#section-2), then the query string, then works backward through each path segment.

Step-by-step: Build a URL hierarchy query

Building useful analytics from URLHierarchy requires combining it with array manipulation functions. The process involves generating the hierarchy array, then exploding it into separate rows for analysis.

1. Select raw URLs

Start with your source table containing URL data. Most web analytics scenarios involve pageview or session tables:

SELECT url, user_id, timestamp 
FROM pageviews 
WHERE date = today()
LIMIT 10

Your URL column typically contains complete request URLs with paths, query parameters, and sometimes fragments that you want to analyze at different hierarchy levels.

2. Apply URLHierarchy

Add the URLHierarchy function to generate hierarchy arrays for each URL:

SELECT 
    url,
    URLHierarchy(url) AS url_levels
FROM pageviews 
WHERE date = today()
LIMIT 10

At this point, each row contains the original URL plus an array of all its hierarchy levels. The array format isn't immediately useful for aggregation—you need to convert it into separate rows.

3. Explode with arrayJoin

Use arrayJoin to flatten the hierarchy array, creating one row for each level:

SELECT 
    original_url,
    hierarchy_level
FROM (
    SELECT 
        url AS original_url,
        URLHierarchy(url) AS url_levels
    FROM pageviews 
    WHERE date = today()
) 
ARRAY JOIN url_levels AS hierarchy_level

Now each original URL generates multiple rows—one for each level in its path hierarchy. This transformation enables you to aggregate traffic at each path depth.

4. Aggregate metrics by depth

Group by hierarchy level to count visits and analyze traffic patterns:

SELECT 
    hierarchy_level,
    COUNT(*) AS page_views,
    COUNT(DISTINCT user_id) AS unique_visitors
FROM (
    SELECT 
        url,
        user_id,
        URLHierarchy(url) AS url_levels
    FROM pageviews 
    WHERE date >= today() - 7
) 
ARRAY JOIN url_levels AS hierarchy_level
GROUP BY hierarchy_level
ORDER BY page_views DESC
LIMIT 20

This query reveals which sections of your site receive the most traffic when you consider the hierarchical nature of URL paths.

Edge cases and path boundary rules

URLHierarchy handles various URL formats, but understanding its behavior with edge cases helps you write more robust queries and interpret results correctly.

Handling query strings

URLHierarchy always truncates at the ? character that separates the path from query parameters. When your URL contains search parameters or filters, the function includes both the full URL with parameters and a version with parameters removed.

This behavior becomes important when analyzing search pages or filtered product listings—you might want to group URLs by their base path regardless of the specific parameters users applied.

Double or trailing slashes

Consecutive forward slashes and trailing slashes create unexpected hierarchy levels. URLHierarchy treats https://example.com//blog/ differently from https://example.com/blog/, potentially creating empty or redundant hierarchy levels in your results.

You can normalize URLs before processing with replaceRegexpAll(url, '/+', '/') to collapse multiple slashes, though this adds complexity to your queries.

Non-HTTP schemes

URLHierarchy works with non-HTTP schemes like ftp://, file://, or custom application schemes. However, the hierarchy logic remains focused on slash-separated paths, which might not match the conventions of specialized schemes.

For custom schemes, you might need to combine URLHierarchy with string manipulation functions to achieve the hierarchy structure your analysis requires.

Performance tips for large-scale web data

Processing millions of URLs with URLHierarchy can consume significant memory and processing resources, especially when arrayJoin creates multiple output rows for each input URL.

Memory-safe array joins

Large hierarchy arrays can cause memory pressure when arrayJoin explodes them into separate rows. Consider batching your queries by date ranges or limiting the number of URLs processed at once:

SELECT hierarchy_level, COUNT(*) as visits
FROM (
    SELECT URLHierarchy(url) AS url_levels
    FROM pageviews 
    WHERE date = today() - 1  -- Process one day at a time
    AND user_id % 100 = 0     -- Sample 1% of users
) 
ARRAY JOIN url_levels AS hierarchy_level
GROUP BY hierarchy_level

This approach prevents memory overflow while still allowing you to process large datasets effectively.

LowCardinality dictionaries

Many websites follow predictable URL patterns where the same path prefixes appear frequently. Declare your URL column as LowCardinality(String) to improve compression and query performance:

CREATE TABLE pageviews (
    url LowCardinality(String),
    user_id String,
    timestamp DateTime
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)

LowCardinality encoding works particularly well when your URLs follow consistent patterns with repeated domain names and common path structures.

Using projections or indexes

For frequently-accessed hierarchy analysis, consider creating a projection that pre-computes URLHierarchy results:

ALTER TABLE pageviews 
ADD PROJECTION url_hierarchy_proj (
    SELECT 
        hierarchy_level,
        COUNT() as visits
    FROM (
        SELECT URLHierarchy(url) AS url_levels
        FROM pageviews
    ) 
    ARRAY JOIN url_levels AS hierarchy_level
    GROUP BY hierarchy_level
)

Projections store pre-computed results that ClickHouse automatically uses when your queries match the projection structure. For more details on using projections for query optimization, this approach can significantly improve performance for complex URL hierarchy analysis.

Persisting hierarchies with a materialized view

When you're repeatedly analyzing URL hierarchies, creating a materialized view eliminates the need to recompute URLHierarchy results for every query. For a comprehensive guide on what materialized views are and why they matter for real-time analytics, this approach can significantly improve performance for complex URL analysis.

View definition

Create a materialized view that processes incoming URL data and stores the exploded hierarchy:

CREATE MATERIALIZED VIEW url_hierarchy_stats
ENGINE = SummingMergeTree()
ORDER BY (hierarchy_level, date)
AS SELECT 
    hierarchy_level,
    toDate(timestamp) as date,
    COUNT() as visits,
    uniq(user_id) as unique_visitors
FROM (
    SELECT 
        timestamp,
        user_id,
        URLHierarchy(url) AS url_levels
    FROM pageviews
) 
ARRAY JOIN url_levels AS hierarchy_level
GROUP BY hierarchy_level, toDate(timestamp)

This materialized view automatically processes new data as it arrives, maintaining pre-computed hierarchy statistics that you can query directly without reprocessing raw URLs.

Refresh and reparse strategy

Materialized views in ClickHouse only process new inserts—they don't automatically handle changes to existing data. If you need to reprocess historical data or handle URL structure changes, you'll need to recreate the view:

DROP VIEW url_hierarchy_stats;
-- Recreate with updated logic
CREATE MATERIALIZED VIEW url_hierarchy_stats ...

For production systems, consider implementing a periodic refresh strategy or using incremental processing to handle data updates without full rebuilds.

Related ClickHouse URL functions and when to use them

URLHierarchy works alongside other ClickHouse URL functions, each optimized for different analytical scenarios. Understanding when to use each function helps you choose the right tool for your specific use case.

parseURL

The parseURL function breaks URLs into structured components when you need specific parts rather than hierarchical analysis:

SELECT 
    parseURL(url).host as domain,
    parseURL(url).path as path,
    parseURL(url).query as query_string
FROM pageviews

Use parseURL when building domain-level reports or extracting specific URL components for filtering and grouping operations.

cutURLPath

cutURLPath removes the path portion entirely, leaving only the protocol and domain:

SELECT 
    cutURLPath(url) as base_domain,
    COUNT() as total_visits
FROM pageviews
GROUP BY base_domain

Choose cutURLPath over URLHierarchy when your analysis focuses on cross-domain traffic or referrer analysis where path details aren't relevant.

domain

The domain function extracts just the hostname without protocol, making it ideal for grouping by site:

SELECT 
    domain(referrer) as referrer_domain,
    COUNT() as referrals
FROM pageviews
WHERE referrer != ''
GROUP BY referrer_domain
ORDER BY referrals DESC

Domain extraction works particularly well for referrer analysis and understanding traffic sources.

topLevelDomain

topLevelDomain returns only the TLD (.com, .org, .uk), which helps with geographic or organizational analysis:

SELECT 
    topLevelDomain(url) as tld,
    COUNT() as visits
FROM pageviews
GROUP BY tld
ORDER BY visits DESC

This function works well for understanding the geographic distribution of your traffic or analyzing different types of organizations visiting your site.

Troubleshooting common errors

URLHierarchy can produce unexpected results when working with malformed URLs or edge cases in real-world data. Here are the most common issues and how to handle them.

NULL or empty returns

Malformed URLs or empty strings cause URLHierarchy to return empty arrays or NULL values. Handle these cases defensively:

SELECT 
    url,
    CASE 
        WHEN length(URLHierarchy(url)) > 0 
        THEN URLHierarchy(url)
        ELSE ['invalid_url']
    END as hierarchy_array
FROM pageviews
WHERE url IS NOT NULL AND url != ''

This pattern ensures your arrayJoin operations don't lose rows due to empty hierarchy arrays.

Version compatibility issues

URLHierarchy behavior has evolved across ClickHouse versions, particularly around handling IPv6 addresses and international domain names. Version 21.3 and later include improvements for complex URL formats.

If you're seeing inconsistent results after upgrading ClickHouse, test your URLHierarchy queries against a sample of your production data to identify behavioral changes.

Unexpected array lengths

URLs with unusual structures can produce hierarchy arrays with more or fewer elements than expected. Debug unexpected array lengths by examining the raw URLHierarchy output:

SELECT 
    url,
    URLHierarchy(url) as hierarchy,
    length(URLHierarchy(url)) as hierarchy_depth
FROM pageviews
WHERE length(URLHierarchy(url)) > 10  -- Find unusually deep hierarchies
LIMIT 10

This query helps identify URLs that create unexpectedly complex hierarchies, often due to malformed paths or unusual URL structures.

Building a ClickHouse-based "content drilldown" API with Tinybird

A common URL hierarchy analytics use case is building a content drilldown report to show how users navigate through your site's content hierarchy, from broad subfolders down to specific pages. It reveals which sections drive the most traffic and where users drop off in your content funnel. Building this functionality requires aggregating page views across different levels of your URL hierarchy, exactly what ClickHouse's URLHierarchy function enables. For more advanced analytics scenarios including multitenancy and AI-powered insights, URL hierarchy analysis provides the foundation for understanding user behavior patterns across different customer segments.

Here's how to build a complete content drilldown analytics API that replicates and extends Google Analytics' content reporting capabilities using Tinybird's managed ClickHouse service.

Step 1: Set up content analytics data source

Create a data source optimized for content drilldown analysis. This captures page views with the URL structure needed for hierarchical content reporting:

SCHEMA >
    `timestamp` DateTime64(3, 'UTC') `json:$.timestamp`,
    `session_id` String `json:$.session_id`,
    `user_id` String `json:$.user_id`,
    `page_url` String `json:$.page_url`,
    `page_title` String `json:$.page_title`,
    `content_category` String `json:$.content_category`,
    `content_type` String `json:$.content_type`,
    `referrer_url` String `json:$.referrer_url`,
    `user_agent` String `json:$.user_agent`,
    `country` LowCardinality(String) `json:$.country`,
    `device_type` LowCardinality(String) `json:$.device_type`,
    `is_bounce` UInt8 `json:$.is_bounce`,
    `time_on_page` UInt32 `json:$.time_on_page`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp, session_id"

Step 2: Ingest sample content data

Start the local Tinybird development server:

tb dev

Ingest sample content analytics data that represents a typical content hierarchy:

curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN" \
  "http://localhost:7181/v0/events?name=web_analytics" \
  -d '[
    {
      "timestamp": "2024-12-01 10:30:00",
      "session_id": "sess_001",
      "user_id": "user_123",
      "page_url": "https://example.com/blog/tech/databases/clickhouse-guide",
      "page_title": "Complete ClickHouse Guide for Analytics",
      "content_category": "Technology",
      "content_type": "Tutorial",
      "referrer_url": "https://google.com/search",
      "user_agent": "Mozilla/5.0 Chrome/91.0",
      "country": "US",
      "device_type": "desktop",
      "is_bounce": 0,
      "time_on_page": 180
    },
    {
      "timestamp": "2024-12-01 11:15:00",
      "session_id": "sess_002",
      "user_id": "user_456",
      "page_url": "https://example.com/blog/tech/databases",
      "page_title": "Database Articles",
      "content_category": "Technology",
      "content_type": "Category",
      "referrer_url": "https://example.com/blog/tech",
      "user_agent": "Mozilla/5.0 Safari/14.0",
      "country": "UK",
      "device_type": "mobile",
      "is_bounce": 0,
      "time_on_page": 45
    },
    {
      "timestamp": "2024-12-01 12:45:00",
      "session_id": "sess_003",
      "user_id": "user_789",
      "page_url": "https://example.com/blog/tech",
      "page_title": "Technology Blog",
      "content_category": "Technology",
      "content_type": "Section",
      "referrer_url": "https://example.com/",
      "user_agent": "Mozilla/5.0 Firefox/89.0",
      "country": "CA",
      "device_type": "desktop",
      "is_bounce": 0,
      "time_on_page": 120
    },
    {
      "timestamp": "2024-12-01 14:20:00",
      "session_id": "sess_004",
      "user_id": "user_012",
      "page_url": "https://example.com/blog",
      "page_title": "Blog Homepage",
      "content_category": "Blog",
      "content_type": "Landing",
      "referrer_url": "https://example.com/",
      "user_agent": "Mozilla/5.0 Chrome/91.0",
      "country": "US",
      "device_type": "desktop",
      "is_bounce": 1,
      "time_on_page": 15
    }
  ]'

Step 3: Build content drilldown APIs

We can now create APIs that replicate Google Analytics' content drilldown functionality. These show traffic patterns across your content hierarchy from broad categories to specific pages.

Create content_drilldown.pipe for the main content drilldown report:

NODE content_drilldown
SQL >
    %
    WITH content_hierarchy AS (
        SELECT 
            timestamp,
            session_id,
            user_id,
            page_url,
            page_title,
            content_category,
            content_type,
            country,
            device_type,
            is_bounce,
            time_on_page,
            URLHierarchy(page_url) AS url_levels
        FROM web_analytics
        WHERE timestamp >= {{DateTime(start_date, '2024-12-01 00:00:00')}}
          AND timestamp <= {{DateTime(end_date, '2024-12-02 00:00:00')}}
        {\% if defined(domain_filter) %}
          AND domain(page_url) = {{String(domain_filter)}}
        {\% end %}
        {\% if defined(category_filter) %}
          AND content_category = {{String(category_filter)}}
        {\% end %}
        {\% if defined(country_filter) %}
          AND country = {{String(country_filter)}}
        {\% end %}
    )
    SELECT 
        hierarchy_level,
        path(hierarchy_level) AS content_path,
        page_title,
        content_category,
        content_type,
        count() AS page_views,
        uniq(session_id) AS unique_sessions,
        uniq(user_id) AS unique_users,
        countIf(is_bounce = 1) AS bounces,
        round(countIf(is_bounce = 1) * 100.0 / count(), 2) AS bounce_rate_percent,
        round(avg(time_on_page), 2) AS avg_time_on_page,
        -- Calculate hierarchy depth for drilldown navigation
        length(splitByChar('/', path(hierarchy_level))) - 1 AS hierarchy_depth,
        -- Content performance metrics
        round(count() * 100.0 / sum(count()) OVER (), 2) AS traffic_percentage,
        -- Top referrers for this content level
        topK(3)(referrer_url) AS top_referrers
    FROM content_hierarchy
    ARRAY JOIN url_levels AS hierarchy_level
    WHERE hierarchy_level != ''
      AND hierarchy_level IS NOT NULL
    {\% if defined(min_depth) %}
      AND length(splitByChar('/', path(hierarchy_level))) - 1 >= {{Int32(min_depth, 0)}}
    {\% end %}
    {\% if defined(max_depth) %}
      AND length(splitByChar('/', path(hierarchy_level))) - 1 <= {{Int32(max_depth, 10)}}
    {\% end %}
    GROUP BY hierarchy_level, page_title, content_category, content_type, hierarchy_depth
    ORDER BY page_views DESC
    {\% if defined(limit) %}
    LIMIT {{Int32(limit, 100)}}
    {\% end %}

TYPE ENDPOINT

Create a content funnel analysis API. Create content_funnel_analysis.pipe:

NODE content_funnel_analysis
SQL >
    %
    WITH content_hierarchy AS (
        SELECT 
            timestamp,
            session_id,
            user_id,
            page_url,
            page_title,
            content_category,
            content_type,
            is_bounce,
            time_on_page,
            URLHierarchy(page_url) AS url_levels
        FROM web_analytics
        WHERE timestamp >= {{DateTime(start_date, '2024-12-01 00:00:00')}}
          AND timestamp <= {{DateTime(end_date, '2024-12-02 00:00:00')}}
        {\% if defined(domain_filter) %}
          AND domain(page_url) = {{String(domain_filter)}}
        {\% end %}
        {\% if defined(category_filter) %}
          AND content_category = {{String(category_filter)}}
        {\% end %}
    ),
    funnel_metrics AS (
        SELECT 
            hierarchy_level,
            path(hierarchy_level) AS content_path,
            page_title,
            content_category,
            content_type,
            count() AS page_views,
            uniq(session_id) AS unique_sessions,
            countIf(is_bounce = 1) AS bounces,
            round(avg(time_on_page), 2) AS avg_time_on_page,
            -- Calculate content funnel level
            length(splitByChar('/', path(hierarchy_level))) - 1 AS funnel_depth,
            CASE 
                WHEN funnel_depth <= 1 THEN 'Landing Pages'
                WHEN funnel_depth <= 2 THEN 'Category Pages'
                WHEN funnel_depth <= 3 THEN 'Subcategory Pages'
                ELSE 'Content Pages'
            END AS funnel_level
        FROM content_hierarchy
        ARRAY JOIN url_levels AS hierarchy_level
        WHERE hierarchy_level != '' AND hierarchy_level IS NOT NULL
        GROUP BY hierarchy_level, content_path, page_title, content_category, content_type, funnel_depth, funnel_level
    )
    SELECT 
        funnel_level,
        funnel_depth,
        count() AS unique_content_pages,
        sum(page_views) AS total_page_views,
        sum(unique_sessions) AS total_sessions,
        sum(bounces) AS total_bounces,
        round(sum(bounces) * 100.0 / sum(page_views), 2) AS funnel_bounce_rate,
        round(sum(page_views) / sum(unique_sessions), 2) AS avg_pages_per_session,
        round(avg(avg_time_on_page), 2) AS avg_time_on_page,
        -- Content performance by funnel level
        round(sum(page_views) * 100.0 / sum(sum(page_views)) OVER (), 2) AS traffic_share_percent,
        -- Top performing content at each funnel level
        topK(5)(tuple(content_path, page_title, page_views)) AS top_content
    FROM funnel_metrics
    {\% if defined(funnel_level_filter) %}
    WHERE funnel_level = {{String(funnel_level_filter)}}
    {\% end %}
    GROUP BY funnel_level, funnel_depth
    ORDER BY funnel_depth ASC, total_page_views DESC
    {\% if defined(limit) %}
    LIMIT {{Int32(limit, 20)}}
    {\% end %}

TYPE ENDPOINT

Step 4: Test the content drilldown APIs

Test your content drilldown analytics APIs locally:

# Test main content drilldown report
curl "http://localhost:7181/v0/pipes/content_drilldown.json?start_date=2024-12-01%2000:00:00&end_date=2024-12-02%2000:00:00"

# Test content drilldown with category filtering
curl "http://localhost:7181/v0/pipes/content_drilldown.json?category_filter=Technology&min_depth=1&max_depth=3"

# Test content funnel analysis
curl "http://localhost:7181/v0/pipes/content_funnel_analysis.json?start_date=2024-12-01%2000:00:00&end_date=2024-12-02%2000:00:00"

# Test funnel analysis for specific content level
curl "http://localhost:7181/v0/pipes/content_funnel_analysis.json?funnel_level_filter=Category%20Pages"

You should see content hierarchy analysis in action, where URLs are broken down into their component levels with Google Analytics-style metrics for each content hierarchy depth.

Step 5: Deploy to production

Once you've tested your content drilldown analytics locally and verified the results, deploy to your Tinybird cloud workspace:

tb --cloud deploy

Your content drilldown APIs are now available in production:

# Production content drilldown API calls
curl "$TINYBIRD_HOST/v0/pipes/content_drilldown.json?domain_filter=yoursite.com&category_filter=Technology&min_depth=1" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"

curl "$TINYBIRD_HOST/v0/pipes/content_funnel_analysis.json?start_date=2024-12-01&end_date=2024-12-07" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"

The APIs return comprehensive content drilldown analytics:

{
    "meta": [
        {"name": "hierarchy_level", "type": "String"},
        {"name": "content_path", "type": "String"},
        {"name": "page_title", "type": "String"},
        {"name": "content_category", "type": "String"},
        {"name": "content_type", "type": "String"},
        {"name": "page_views", "type": "UInt64"},
        {"name": "unique_sessions", "type": "UInt64"},
        {"name": "unique_users", "type": "UInt64"},
        {"name": "bounces", "type": "UInt64"},
        {"name": "bounce_rate_percent", "type": "Float64"},
        {"name": "avg_time_on_page", "type": "Float64"},
        {"name": "hierarchy_depth", "type": "UInt64"},
        {"name": "traffic_percentage", "type": "Float64"},
        {"name": "top_referrers", "type": "Array(String)"}
    ],
    "data": [
        {
            "hierarchy_level": "https://example.com/blog/tech/databases/clickhouse-guide",
            "content_path": "/blog/tech/databases/clickhouse-guide",
            "page_title": "Complete ClickHouse Guide for Analytics",
            "content_category": "Technology",
            "content_type": "Tutorial",
            "page_views": 1,
            "unique_sessions": 1,
            "unique_users": 1,
            "bounces": 0,
            "bounce_rate_percent": 0.0,
            "avg_time_on_page": 180.0,
            "hierarchy_depth": 4,
            "traffic_percentage": 25.0,
            "top_referrers": ["https://google.com/search"]
        },
        {
            "hierarchy_level": "https://example.com/blog/tech/databases",
            "content_path": "/blog/tech/databases",
            "page_title": "Database Articles",
            "content_category": "Technology",
            "content_type": "Category",
            "page_views": 2,
            "unique_sessions": 2,
            "unique_users": 2,
            "bounces": 0,
            "bounce_rate_percent": 0.0,
            "avg_time_on_page": 45.0,
            "hierarchy_depth": 3,
            "traffic_percentage": 50.0,
            "top_referrers": ["https://example.com/blog/tech"]
        }
    ],
    "rows": 2,
    "statistics": {
        "elapsed": 0.008,
        "rows_read": 4,
        "bytes_read": 1024
    }
}

This approach provides:

Google Analytics-style content drilldown using URLHierarchy() function for progressive content analysis
Content hierarchy metrics including page views, sessions, users, bounce rates, and time on page at each content level
Content funnel analysis to understand user navigation patterns through your content structure
Content performance comparison across different hierarchy levels (landing pages, categories, subcategories, content pages)
Flexible filtering by content category, domain, country, device type, and content depth ranges
Real-time content insights for content strategy, user engagement optimization, and content performance analysis

Your content drilldown data becomes available as production-ready API endpoints that automatically handle complex content hierarchy analysis while providing actionable insights for content teams and marketers. Start building with Tinybird's free plan to create your own content drilldown analytics APIs.

Next steps to ship URL analytics faster

Building URL hierarchy analysis in production involves more than writing URLHierarchy queries—you need reliable data ingestion, query optimization, API deployment, and monitoring infrastructure.

Tinybird manages the operational aspects of ClickHouse at scale, enabling use of advanced analytical functions like URLHierarchy. This allows teams to deploy URL analysis APIs without the need to manage underlying database infrastructure.

The platform handles high-throughput data ingestion, automatic query optimization, API authentication and rate limiting, and monitoring—all the operational pieces needed to turn your URLHierarchy queries into production-ready analytics services.

You can sign up to Tinybird for free to build and test your first URL hierarchy analytics API in a few minutes.

Additional resources

Subscribe to SCHEMA > Evolution

We are Tinybird and we manage data for companies like Vercel and Canva. Plus, write a newsletter covering Data, AI and everything that matters in between. Join us.

FAQs about URLHierarchy in ClickHouse

Can I limit the maximum path depth returned by URLHierarchy?

URLHierarchy returns all path levels by default, but you can use array slicing with arraySlice() to limit depth or combine with arrayFilter() for conditional truncation.

Does URLHierarchy support IPv6 addresses and internationalized domain names?

URLHierarchy handles IPv6 addresses in brackets and punycode-encoded international domains, but complex Unicode URLs may need preprocessing with decodeURLComponent().

How do I extract only path segments without the domain from URLHierarchy results?

Use arrayMap() with path() function on URLHierarchy output, or combine with cutToFirstSignificantSubdomain() to remove domain portions from each hierarchy level.

Skip the infra work. Deploy your first ClickHouse project now

Product /

Company /

Resources /

Integrations /

Use Cases /

How to build complete URL hierarchies with path truncation in ClickHouse