Join performance troubleshooting

Common issues and solutions for join performance in ClickHouse and Tinybird.

ClickHouse SQL Reference: The examples in this page use standard ClickHouse SQL syntax and settings. Some statements like CREATE TABLE, CREATE MATERIALIZED VIEW, INSERT, and SETTINGS clauses are not supported in Tinybird. For Tinybird-specific guidance, see the Tinybird optimization guide.

Found an issue with this documentation? Report it via the chat widget in the bottom right corner.

Common issues

Large table joins

Issue: Joining large tables without optimization

Solution: Use selective filters and ensure join keys are in sorting keys

-- Add filters before join
SELECT * FROM events l
JOIN dimension_table s ON l.id = s.id
WHERE l.date >= today() - 7

Missing join conditions

Issue: Cartesian products from missing join conditions

Solution: Always specify join conditions

-- Always specify join conditions
SELECT * FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id

Performance optimization

Replace JOINs with subqueries when possible

Issue: JOINs can be expensive, especially when the right table is large

Solution: Use subqueries instead of JOINs when you only need to filter

-- Instead of JOIN, use subquery filter
SELECT * FROM events
WHERE user_id IN (SELECT user_id FROM users WHERE status = 'active')

Filter the right table first

Issue: JOINs scanning the entire right table, causing performance issues

Solution: Filter the right table (which should be a small table or dimension table) using sorting key fields before joining

-- Filter right table using sorting key before join
-- The right table should be a small table or dimension table
SELECT a.id, a.value, b.value
FROM events a
LEFT JOIN (
  SELECT id, value
  FROM small_dimension_table
  WHERE id IN (SELECT id FROM events)
) b USING id

This pattern filters the right table using the join key, which should be in the sorting key of the right table. The right table should be small relative to the left table. This allows ClickHouse to use the index and significantly reduce the amount of data scanned.

Using appropriate join types

Issue: Using wrong join type for the use case

Solution: Choose appropriate join type

-- Use LEFT JOIN when you need all records from left table
SELECT * FROM users u
LEFT JOIN events e ON u.id = e.user_id

-- Use INNER JOIN for matching records only
SELECT * FROM users u
INNER JOIN events e ON u.id = e.user_id

Memory management

Memory pressure during joins

Issue: Joins consuming too much memory

Solution: Filter the right table (small table) before joining and process in chunks

-- Filter right table and process in chunks by date ranges
SELECT * FROM events e
JOIN (
  SELECT id, name
  FROM small_dimension_table
  WHERE id IN (SELECT DISTINCT id FROM events WHERE date BETWEEN '2023-01-01' AND '2023-01-07')
) d ON e.id = d.id
WHERE e.date BETWEEN '2023-01-01' AND '2023-01-07'

Chunked join processing

Issue: Processing entire datasets in single join

Solution: Process joins in chunks by filtering both tables

-- Process joins by date ranges, filtering the right table (small table) first
SELECT * FROM events e
JOIN (
  SELECT user_id, name
  FROM small_users_table
  WHERE user_id IN (SELECT DISTINCT user_id FROM events WHERE date BETWEEN '2023-01-01' AND '2023-01-07')
) u ON e.user_id = u.user_id
WHERE e.date BETWEEN '2023-01-01' AND '2023-01-07'

Best practices

  1. Replace JOINs with subqueries when possible - Use WHERE column IN (SELECT column FROM table) instead of JOINs when you only need to filter
  2. Use small tables on the right - The right table in a JOIN should be a small table or dimension table
  3. Filter the right table first - Filter the right side of the JOIN using sorting key fields to reduce scan size
  4. Use sorting keys - Ensure join keys are in the sorting key of the right table for optimal performance
  5. Choose right join type - Use INNER, LEFT, RIGHT based on needs
  6. Filter before joining - Apply WHERE clauses to the left table before joins
Updated