arrayJoin function¶
arrayJoin¶
The arrayJoin function expands a single row into multiple rows, one for each element in a specified array. It effectively "unrolls" an array, duplicating the original row's other columns for each new row.
Syntax¶
arrayJoin(array)
Arguments¶
array:Array(T). The array whose elements will be expanded into individual rows.
Returns¶
T. The individual elements of the input array, one per row.
Example¶
SELECT arrayJoin([1, 2, 3] AS src) AS dst, 'Hello', src
Result:
┌─dst─┬─'Hello'─┬─src─────┐ │ 1 │ Hello │ [1,2,3] │ │ 2 │ Hello │ [1,2,3] │ │ 3 │ Hello │ [1,2,3] │ └─────┴───────────┴─────────┘
Usage in WHERE clauses¶
The arrayJoin function can also be used within a WHERE clause to filter rows based on array elements after expansion.
SELECT sum(1) AS impressions
FROM
(
SELECT ['Istanbul', 'Berlin', 'Bobruisk'] AS cities
)
WHERE arrayJoin(cities) IN ['Istanbul', 'Berlin']
Result:
┌─impressions─┐ │ 2 │ └─────────────┘
Multiple arrayJoin calls¶
When multiple arrayJoin functions are used in a single query, they produce a Cartesian product of the elements from each array.
SELECT
sum(1) AS impressions,
arrayJoin(cities) AS city,
arrayJoin(browsers) AS browser
FROM
(
SELECT
['Istanbul', 'Berlin', 'Bobruisk'] AS cities,
['Firefox', 'Chrome', 'Chrome'] AS browsers
)
GROUP BY
2,
3
Result:
┌─impressions─┬─city─────┬─browser─┐ │ 2 │ Istanbul │ Chrome │ │ 1 │ Istanbul │ Firefox │ │ 2 │ Berlin │ Chrome │ │ 1 │ Berlin │ Firefox │ │ 2 │ Bobruisk │ Chrome │ │ 1 │ Bobruisk │ Firefox │ └─────────────┴──────────┴─────────┘
Using multiple arrayJoin with same expression might not produce expected results due to optimizations. For those cases, consider modifying repeated array expression with extra operations that don't affect join result, for example arrayJoin(arraySort(arr)), arrayJoin(arrayConcat(arr, []))
Example:
SELECT
arrayJoin(dice) as first_throw,
/* arrayJoin(dice) as second_throw */ -- is technically correct, but will annihilate result set
arrayJoin(arrayConcat(dice, [])) as second_throw -- intentionally changed expression to force re-evaluation
FROM (
SELECT [1, 2, 3, 4, 5, 6] as dice
)
Result:
┌─first_throw─┬─second_throw─┐ │ 1 │ 1 │ │ 1 │ 2 │ │ 1 │ 3 │ │ 1 │ 4 │ │ 1 │ 5 │ │ 1 │ 6 │ │ 2 │ 1 │ │ 2 │ 2 │ │ 2 │ 3 │ │ 2 │ 4 │ │ 2 │ 5 │ │ 2 │ 6 │ │ 3 │ 1 │ │ 3 │ 2 │ │ 3 │ 3 │ │ 3 │ 4 │ │ 3 │ 5 │ │ 3 │ 6 │ │ 4 │ 1 │ │ 4 │ 2 │ │ 4 │ 3 │ │ 4 │ 4 │ │ 4 │ 5 │ │ 4 │ 6 │ │ 5 │ 1 │ │ 5 │ 2 │ │ 5 │ 3 │ │ 5 │ 4 │ │ 5 │ 5 │ │ 5 │ 6 │ │ 6 │ 1 │ │ 6 │ 2 │ │ 6 │ 3 │ │ 6 │ 4 │ │ 6 │ 5 │ │ 6 │ 6 │ └─────────────┴──────────────┘
The ARRAY JOIN Clause¶
The ARRAY JOIN clause is a SQL construct that allows you to expand multiple arrays into individual rows. When multiple arrays are specified with aliases, it performs an element-wise join, provided the arrays have the same number of elements. This differs from using multiple arrayJoin() functions, which would produce a Cartesian product.
Example¶
SELECT
sum(1) AS impressions,
city,
browser
FROM
(
SELECT
['Istanbul', 'Berlin', 'Bobruisk'] AS cities,
['Firefox', 'Chrome', 'Chrome'] AS browsers
)
ARRAY JOIN
cities AS city,
browsers AS browser
GROUP BY
2,
3
Result:
┌─impressions─┬─city─────┬─browser─┐ │ 1 │ Istanbul │ Firefox │ │ 1 │ Berlin │ Chrome │ │ 1 │ Bobruisk │ Chrome │ └─────────────┴──────────┴─────────┘
Alternatively, you can combine arrayZip with the arrayJoin function to process multiple related arrays as tuples:
SELECT
sum(1) AS impressions,
(arrayJoin(arrayZip(cities, browsers)) AS t).1 AS city,
t.2 AS browser
FROM
(
SELECT
['Istanbul', 'Berlin', 'Bobruisk'] AS cities,
['Firefox', 'Chrome', 'Chrome'] AS browsers
)
GROUP BY
2,
3
Result:
┌─impressions─┬─city─────┬─browser─┐ │ 1 │ Istanbul │ Firefox │ │ 1 │ Berlin │ Chrome │ │ 1 │ Bobruisk │ Chrome │ └─────────────┴──────────┴─────────┘