LowCardinality(T)

This data type wraps another data type, optimizing its storage and query performance by using dictionary encoding for columns with low cardinality.

Syntax

LowCardinality(data_type)

Arguments

  • data_type: Type. The underlying data type to be dictionary-encoded. This can be String, FixedString, Date, DateTime, or numeric types (excluding Decimal).

Description

The LowCardinality data type modifies the internal storage and processing of data by applying dictionary encoding. This technique can significantly boost the performance of SELECT queries, especially for columns that contain a limited number of unique values.

The effectiveness of LowCardinality depends on the diversity of the data. It generally provides better efficiency for both data storage and retrieval when a column has fewer than 10,000 distinct values. However, if a column contains more than 100,000 distinct values, using LowCardinality might lead to worse performance compared to standard data types.

For string columns, LowCardinality is often a more flexible and equally or more efficient alternative to the Enum data type.

Example

You typically define LowCardinality types when creating a Data Source schema.

lc_s.datasource
SCHEMA >
    `s` LowCardinality(String)

ENGINE "MergeTree"

You can also cast values to LowCardinality within a SELECT query:

SELECT toLowCardinality('Tinybird') AS lc_string;

Result:

┌─lc_string─┐
│ Tinybird  │
└───────────┘
Updated