Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

When you're analyzing web traffic or building domain-level analytics, URLs arrive in frustratingly inconsistent formats, some with www prefixes, others without, making it impossible to group www.example.com and example.com as the same domain. ClickHouse's domainWithoutWWW function solves this by extracting clean domain names from any URL format while automatically stripping the www prefix. This guide walks you through the function's syntax and edge cases, shows you how to process entire tables efficiently, and demonstrates how to build real-time domain analytics APIs using scalable infrastructure.

What `domainWithoutWWW` does and when to use it

The domainWithoutWWW function extracts the domain from a URL and automatically removes any leading "www." prefix if present. When you pass 'https://www.example.com/path' to this function, it returns 'example.com', stripping away the protocol, path, query parameters, and the www prefix in one operation.

This function becomes essential when you're working with web analytics data where URLs might inconsistently include or exclude the www prefix. Web traffic often comes from both www.example.com and example.com, but you want to treat them as the same domain for analytics purposes.

You'll find it particularly useful for grouping page views by domain, normalizing referrer data, or preparing URLs for domain-level aggregations. The function handles URL parsing complexity internally, so you don't have to write custom string manipulation logic to achieve consistent domain extraction across your dataset.

Start building with Tinybird!

If you've read this far, you might want to use Tinybird as your analytics backend. You can just get started, on the free plan.

Syntax and parameters

The function follows a straightforward syntax pattern:

domainWithoutWWW(url)

The single parameter accepts a String type (or Nullable(String) if your data contains null values). You can pass full URLs with protocols, bare hostnames, or even URLs with complex query strings and fragments, and the function intelligently extracts just the domain portion.

Here's how it works with a simple example:

SELECT domainWithoutWWW('https://www.example.com/path?query=value') AS clean_domain;
-- Returns: 'example.com'

Return type and edge cases

The function returns a String (or Nullable(String) if your input column allows nulls). However, you'll encounter specific behaviors with malformed or unusual inputs that are worth understanding upfront.

When the function can't parse a valid domain from the input, it typically returns an empty string rather than throwing an error. This graceful handling means your queries won't break on messy data, though you might want to filter out empty results depending on your use case.

Empty Or Null Input

Empty strings and null values behave predictably:

SELECT domainWithoutWWW('') AS empty_result;
-- Returns: ''

SELECT domainWithoutWWW(NULL) AS null_result;
-- Returns: NULL

If you're working with a Nullable(String) column and want to avoid nulls in your results, you can use coalesce:

SELECT coalesce(domainWithoutWWW(url), '') AS clean_domain
FROM your_table;

Internationalized Domains

ClickHouse URL functions work with ASCII hostnames, so internationalized domains work best when already converted to punycode format:

SELECT domainWithoutWWW('https://www.xn--e1afmkfd.xn--p1ai') AS punycode_domain;
-- Returns: 'xn--e1afmkfd.xn--p1ai'

Unicode hostnames passed directly might produce inconsistent results depending on your ClickHouse build and locale settings.

Missing Protocol

The function can extract domains from both full URLs and bare hostnames, though results vary with malformed strings:

SELECT domainWithoutWWW('www.example.com') AS bare_hostname;
-- Returns: 'example.com'

SELECT domainWithoutWWW('example.com/some/path') AS with_path;
-- Returns: 'example.com'

SELECT domainWithoutWWW('/just/a/path') AS path_only;
-- Returns: ''

Quick examples on single URLs

Let's walk through practical examples that demonstrate the function's behavior across different URL formats you'll encounter in real data.

Standard www URLs get cleaned as expected:

SELECT domainWithoutWWW('http://www.example.com') AS http_www;
-- Returns: 'example.com'

SELECT domainWithoutWWW('https://www.example.com:8443?q=1') AS https_with_port;
-- Returns: 'example.com'

URLs without www remain unchanged (except for protocol and path removal):

SELECT domainWithoutWWW('https://example.com') AS no_www;
-- Returns: 'example.com'

SELECT domainWithoutWWW('sub.example.com') AS subdomain;
-- Returns: 'sub.example.com'

Complex URLs with paths, queries, and fragments still extract cleanly:

domainWithoutWWW('https://www.blog.example.com/path/article?id=123#section') returns 'blog.example.com'
domainWithoutWWW('ftp://www.example.co.uk/files/download') returns 'example.co.uk'

Notice how the function preserves subdomains like "blog" while removing only the "www" prefix specifically.

Processing an entire ClickHouse table

Moving from single values to processing entire tables unlocks the real power of domain normalization at scale. Let's assume you have a table structure like this:

CREATE TABLE visits (
    ts DateTime,
    url String,
    user_id UInt64
) ENGINE = MergeTree 
ORDER BY ts;

Selecting Clean Domains

Extract and analyze cleaned domains across your dataset:

SELECT
    domainWithoutWWW(url) AS clean_domain,
    count() AS visits,
    uniq(user_id) AS unique_users
FROM visits
WHERE ts >= now() - INTERVAL 7 DAY
GROUP BY clean_domain
ORDER BY visits DESC
LIMIT 10;

You can filter out malformed URLs that produce empty domains:

SELECT *
FROM visits
WHERE domainWithoutWWW(url) != ''
  AND ts >= today() - 1;

Updating In Place

For tables where you frequently query by clean domain, consider adding a dedicated column:

ALTER TABLE visits ADD COLUMN clean_domain String DEFAULT '';
ALTER TABLE visits UPDATE clean_domain = domainWithoutWWW(url) WHERE 1;

Performance consideration: ALTER...UPDATE operations rewrite data and can be expensive on large tables. Schedule during low-traffic periods or consider the materialized column approach instead.

A materialized column automatically computes the clean domain for new data:

ALTER TABLE visits ADD COLUMN clean_domain String MATERIALIZED domainWithoutWWW(url);

Materialized View Pattern

For high-frequency analytics queries, you can precompute domain aggregations using materialized views:

CREATE TABLE domain_stats (
    clean_domain String,
    visits AggregateFunction(count),
    unique_users AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree
ORDER BY clean_domain;

CREATE MATERIALIZED VIEW visits_to_domain_stats
TO domain_stats
AS SELECT
    domainWithoutWWW(url) AS clean_domain,
    countState() AS visits,
    uniqState(user_id) AS unique_users
FROM visits
GROUP BY clean_domain;

This pattern precomputes domain-level metrics at insert time, dramatically speeding up dashboard queries that aggregate by domain.

`domainWithoutWWW` vs `cutWWW` vs `domain`

ClickHouse provides several related functions for URL and domain processing, each optimized for different use cases:

domainWithoutWWW: Extracts domain from full URLs and removes www prefix
cutWWW: Removes www prefix from domain strings only
domain: Extracts domain from URLs but preserves www

Here's how they differ in practice:

-- domainWithoutWWW: Full URL processing with www removal
SELECT domainWithoutWWW('https://www.example.com/path') AS result;
-- Returns: 'example.com'

-- cutWWW: String processing for already-extracted domains
SELECT cutWWW('www.example.com') AS result;
-- Returns: 'example.com'

-- domain: Full URL processing but keeps www
SELECT domain('https://www.example.com/path') AS result;
-- Returns: 'www.example.com'

Choose domainWithoutWWW when you're working with raw URL data and want normalized domain grouping. Use cutWWW for post-processing already-extracted domains, and domain when the www distinction matters for your analysis.

Performance considerations at scale

URL parsing involves string operations that can become expensive at scale, especially when processing billions of rows. Understanding performance characteristics helps you architect efficient data pipelines.

Function Costs In CPU Cycles

String parsing adds computational overhead proportional to your URL complexity and data volume. URLs with long paths, many query parameters, or unusual encoding patterns require more CPU cycles to process.

The function also creates new string objects for each result, which impacts memory allocation patterns and garbage collection in high-throughput scenarios.

Using Projections Or Views

Precomputing cleaned domains often yields better query performance than on-the-fly parsing:

ALTER TABLE visits ADD PROJECTION domain_projection (
    SELECT 
        domainWithoutWWW(url) AS clean_domain,
        count()
    GROUP BY clean_domain
);

Materialized views can aggregate by clean domain during data ingestion, eliminating parsing overhead from analytical queries entirely.

Benchmark Results

In practice, on-the-fly parsing works well for exploratory queries and datasets under 100M rows. Beyond that scale, or for frequently-accessed dashboard queries, precomputation through materialized columns or views typically provides 2-5x performance improvements.

The exact performance gain depends on your URL complexity, query patterns, and hardware configuration, but the pattern holds consistently across different workloads.

Subscribe to SCHEMA > Evolution

We are Tinybird and we manage data for companies like Vercel and Canva. Plus, write a newsletter covering Data, AI and everything that matters in between. Join us.

Building a real-time, ClickHouse-based web analytics API with Tinybird

Let's create a complete example that ingests web traffic data and exposes an API endpoint returning domain-level analytics using domainWithoutWWW.

This walkthrough shows how domain normalization integrates into real-world analytics workflows, from data ingestion through API deployment. You'll see exactly how the function works in practice.

Tinybird is a managed service that makes it easy to build real-time analytics APIs powered by ClickHouse. It handles the infrastructure, scaling, and API generation so you can focus on writing SQL and building analytics features.

In this example, we'll use Tinybird to:

Create a data source for ingesting web traffic events
Write a pipe that processes and aggregates the data using domainWithoutWWW
Deploy an API endpoint that serves domain-level analytics

The workflow demonstrates how to combine ClickHouse's powerful URL processing with Tinybird's API capabilities to build production-ready analytics services.

1. Create the data source

First, create a data source to store your web traffic events:

SCHEMA >
    `timestamp` DateTime64(3, 'UTC') `json:$.timestamp`,
    `url` String `json:$.url`,
    `user_id` UInt32 `json:$.user_id`,
    `session_id` String `json:$.session_id`,
    `page_title` String `json:$.page_title`,
    `referrer` String `json:$.referrer`

ENGINE MergeTree
ENGINE_SORTING_KEY timestamp
ENGINE_PARTITION_KEY toYYYYMM(timestamp)

Now build the data source locally:

tb dev

This builds the data source on Tinybird Local (localhost:7181), which you'll need to have running. Then ingest some sample data using the Events API:

curl -X POST "http://localhost:7181/v0/events?name=web_traffic" \
  -H "Content-Type: application/json" \
  -d '{"timestamp": "2025-10-03 14:30:15.123", "url": "https://www.example.com/products", "user_id": 1001, "session_id": "sess_abc123", "page_title": "Products Page", "referrer": "https://www.google.com/search"}
{"timestamp": "2025-10-03 14:30:16.456", "url": "https://blog.example.com/article", "user_id": 1002, "session_id": "sess_def456", "page_title": "Blog Article", "referrer": "https://www.facebook.com"}
{"timestamp": "2025-10-03 14:30:17.789", "url": "https://www.github.com/tinybird", "user_id": 1003, "session_id": "sess_ghi789", "page_title": "GitHub Profile", "referrer": "https://www.twitter.com"}'

2. Write the pipe query

Create a pipe that aggregates traffic by normalized domain:

NODE traffic_events
SQL >
    SELECT
        timestamp,
        domainWithoutWWW(url) AS clean_domain,
        user_id,
        session_id,
        page_title,
        domainWithoutWWW(referrer) AS clean_referrer
    FROM web_traffic
    WHERE domainWithoutWWW(url) != ''
      AND timestamp >= now() - INTERVAL 7 DAY

NODE domain_analytics
SQL >
    %
    SELECT
        clean_domain,
        COUNT(*) AS total_visits,
        COUNT(DISTINCT user_id) AS unique_users,
        COUNT(DISTINCT session_id) AS unique_sessions,
        COUNT(DISTINCT clean_referrer) AS unique_referrers,
        MIN(timestamp) AS first_visit,
        MAX(timestamp) AS last_visit
    FROM traffic_events
    WHERE 
            clean_domain = {{ String(domain) }}
            AND timestamp >= {{ DateTime(date_from) }}
            AND timestamp <= {{ DateTime(date_to) }}
    GROUP BY clean_domain
    ORDER BY total_visits DESC
    LIMIT {{ Int32(limit, 100) }}

TYPE endpoint
The domainWithoutWWW function ensures consistent domain grouping regardless of www prefixes in your URLs.
3. Deploy and call the endpoint
Deploy your pipe to create the API endpoint:
Explain code with AI
Copy
tb --cloud deploy
Then call your new endpoint:
Explain code with AI
Copy
curl -H "Authorization: Bearer <your_token>" \
  "https://api.tinybird.co/v0/pipes/domain_analytics.json?domain=example.com&limit=5"
The response returns neatly aggregated data with normalized domains:
Explain code with AI
Copy
{
  ...,
  "data": [
    {
      "clean_domain": "example.com",
      "total_visits": 1247,
      "unique_users": 892,
      "unique_sessions": 1156,
      "unique_referrers": 23,
      "first_visit": "2025-10-03 14:30:15",
      "last_visit": "2025-10-03 15:45:22"
    }
  ],
  "rows": 1,
  "statistics": {
    "elapsed": 0.023,
    "rows_read": 1000,
    "bytes_read": 45000
  }
}
Next steps with ClickHouse and Tinybird
ClickHouse is greating for normalizing URL data to build clean domain-level analytics. For large datasets, ClickHouse allows you to precompute clean domains through materialized columns or views to optimize query performance.
When you're ready to operationalize your ClickHouse-based analytics workflows into APIs, Tinybird enables you to deploy production APIs over your data transformations and manages the operational complexity of the underlying infrastructure.
Get started with a free Tinybird account to create real-time APIs over your domain analysis data.
FAQs about extracting domains in ClickHouse
Does domainWithoutWWW handle multiple subdomains like blog.www.example.com?
The function only strips a leading "www." prefix, leaving other subdomains intact. So domainWithoutWWW('https://blog.www.example.com/article') returns 'blog.example.com', removing the www but preserving the blog subdomain.
Is the domainWithoutWWW function available in older ClickHouse versions?
Yes, domainWithoutWWW has been part of ClickHouse's core URL functions for many years and works across most maintained versions. You won't encounter compatibility issues unless you're running a very outdated ClickHouse installation.
Can I remove both the protocol and www together in one operation?
Absolutely. The domainWithoutWWW function automatically handles both, extracting the hostname from any valid URL format and removing the www prefix in a single operation. domainWithoutWWW('https://www.example.com/path?query=1') gives you 'example.com' directly.
Additional resources
domainWithoutWWW function
cutWWW function
domain function
URL functions in ClickHouse
String functions in ClickHouse
Working with URLs in Tinybird
How to extract URL fragments without hash symbols (#) in ClickHouse
How to extract the protocol of a URL in ClickHouse
How to extract query parameters from URLs in ClickHouse

Skip the infra work. Deploy your first ClickHouse
project now

Product /

Company /

Resources /

Integrations /

Use Cases /

How to extract domains without www from URLs in ClickHouse

What `domainWithoutWWW` does and when to use it

Syntax and parameters

Return type and edge cases

Empty Or Null Input

Internationalized Domains

Missing Protocol

Quick examples on single URLs

Processing an entire ClickHouse table

Selecting Clean Domains

Updating In Place

Materialized View Pattern

`domainWithoutWWW` vs `cutWWW` vs `domain`

Performance considerations at scale

Function Costs In CPU Cycles

Using Projections Or Views

Benchmark Results

Building a real-time, ClickHouse-based web analytics API with Tinybird

1. Create the data source

2. Write the pipe query

3. Deploy and call the endpoint

Next steps with ClickHouse and Tinybird

FAQs about extracting domains in ClickHouse

Does `domainWithoutWWW` handle multiple subdomains like blog.www.example.com?

Is the `domainWithoutWWW` function available in older ClickHouse versions?

Can I remove both the protocol and www together in one operation?

Additional resources

Skip the infra work. Deploy your first ClickHouse
project now

Product /

Company /

Resources /

Integrations /

Use Cases /

Skip the infra work. Deploy your first ClickHouse project now

Product /

Company /

Resources /

Integrations /

Use Cases /

How to extract domains without www from URLs in ClickHouse

What domainWithoutWWW does and when to use it

Syntax and parameters

Return type and edge cases

Empty Or Null Input

Internationalized Domains

Missing Protocol

Quick examples on single URLs

Processing an entire ClickHouse table

Selecting Clean Domains

Updating In Place

Materialized View Pattern

domainWithoutWWW vs cutWWW vs domain

Performance considerations at scale

Function Costs In CPU Cycles

Using Projections Or Views

Benchmark Results

Building a real-time, ClickHouse-based web analytics API with Tinybird

1. Create the data source

2. Write the pipe query

3. Deploy and call the endpoint

Next steps with ClickHouse and Tinybird

FAQs about extracting domains in ClickHouse

Does domainWithoutWWW handle multiple subdomains like blog.www.example.com?

Is the domainWithoutWWW function available in older ClickHouse versions?

Can I remove both the protocol and www together in one operation?

Additional resources

Skip the infra work. Deploy your first ClickHouse project now

Product /

Company /

Resources /

Integrations /

Use Cases /

Skip the infra work. Deploy your first ClickHouse
project now

What `domainWithoutWWW` does and when to use it

`domainWithoutWWW` vs `cutWWW` vs `domain`

Does `domainWithoutWWW` handle multiple subdomains like blog.www.example.com?

Is the `domainWithoutWWW` function available in older ClickHouse versions?

Skip the infra work. Deploy your first ClickHouse
project now