Functions for searching in strings¶
All functions in this section search case-sensitively by default. Case-insensitive search is usually provided by separate function variants.
Functions in this section also assume that the searched string, referred to in this section as haystack, and the search string, referred to in this section as needle, are single-byte encoded text. If this assumption is violated, no exception is thrown and results are undefined. Search with UTF-8 encoded strings is usually provided by separate function variants. Likewise, if a UTF-8 function variant is used and the input strings aren't UTF-8 encoded text, no exception is thrown and the results are undefined. Note that no automatic Unicode normalization is performed, however you can use the normalizeUTF8*() functions for that.
General strings functions and functions for replacing in strings are described separately.
position¶
Finds the starting position of a substring within a string. The search is case-sensitive and positions are 1-based.
Syntax¶
position(haystack, needle[, start_pos])
Alias:
position(needle IN haystack)
Arguments¶
haystack: String in which the search is performed. String or Enum.needle: Substring to be searched. String.start_pos: Position (1-based) inhaystackat which the search starts. UInt. Optional.
Returns¶
The 1-based starting position of the needle if found. Returns 0 if the needle is not found. UInt64.
If substring needle is empty, these rules apply:
- if no
start_poswas specified: return1 - if
start_pos = 0: return1 - if
start_pos >= 1andstart_pos <= length(haystack) + 1: returnstart_pos - otherwise: return
0
The same rules also apply to functions locate, positionCaseInsensitive, positionUTF8 and positionCaseInsensitiveUTF8.
Example¶
SELECT position('Hello, world!', '!')
Result:
┌─position('Hello, world!', '!')─┐
│ 13 │
└────────────────────────────────┘
Example with start_pos argument:
SELECT
position('Hello, world!', 'o', 1),
position('Hello, world!', 'o', 7)
Result:
┌─position('Hello, world!', 'o', 1)─┬─position('Hello, world!', 'o', 7)─┐
│ 5 │ 9 │
└───────────────────────────────────┴───────────────────────────────────┘
Example for needle IN haystack syntax:
SELECT 6 = position('/' IN s) FROM (SELECT 'Hello/World' AS s)
Result:
┌─equals(6, position(s, '/'))─┐ │ 1 │ └─────────────────────────────┘
Examples with empty needle substring:
SELECT
position('abc', ''),
position('abc', '', 0),
position('abc', '', 1),
position('abc', '', 2),
position('abc', '', 3),
position('abc', '', 4),
position('abc', '', 5)
Result:
┌─position('abc', '')─┬─position('abc', '', 0)─┬─position('abc', '', 1)─┬─position('abc', '', 2)─┬─position('abc', '', 3)─┬─position('abc', '', 4)─┬─position('abc', '', 5)─┐
│ 1 │ 1 │ 1 │ 2 │ 3 │ 4 │ 0 │
└─────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┘
locate¶
This function is similar to position, but its arguments are ordered differently, with the needle appearing before the haystack.
Syntax¶
locate(needle, haystack[, start_pos])
Arguments¶
needle: Substring to be searched. String.haystack: String in which the search is performed. String or Enum.start_pos: Position (1-based) inhaystackat which the search starts. UInt. Optional.
Returns¶
The 1-based starting position of the needle if found. Returns 0 if the needle is not found. UInt64.
Example¶
SELECT locate('world', 'Hello, world!')
Result:
┌─locate('world', 'Hello, world!')─┐
│ 8 │
└──────────────────────────────────┘
positionCaseInsensitive¶
Finds the starting position of a substring within a string, ignoring case differences. Positions are 1-based.
Syntax¶
positionCaseInsensitive(haystack, needle[, start_pos])
Arguments¶
haystack: String in which the search is performed. String or Enum.needle: Substring to be searched. String.start_pos: Position (1-based) inhaystackat which the search starts. UInt. Optional.
Returns¶
The 1-based starting position of the needle if found. Returns 0 if the needle is not found. UInt64.
Example¶
SELECT positionCaseInsensitive('Hello, world!', 'hello')
Result:
┌─positionCaseInsensitive('Hello, world!', 'hello')─┐
│ 1 │
└───────────────────────────────────────────────────┘
positionUTF8¶
Finds the starting position of a substring within a UTF-8 encoded string. This function correctly handles multi-byte UTF-8 characters.
Syntax¶
positionUTF8(haystack, needle[, start_pos])
Arguments¶
haystack: UTF-8 encoded string in which the search is performed. String or Enum.needle: UTF-8 encoded substring to be searched. String.start_pos: Position (1-based) inhaystackat which the search starts. UInt. Optional.
Returns¶
The 1-based starting position of the needle if found. Returns 0 if the needle is not found. UInt64.
Example¶
Function positionUTF8 correctly counts character ö (represented by two points) as a single Unicode codepoint:
SELECT positionUTF8('Motörhead', 'r')
Result:
┌─position('Motörhead', 'r')─┐
│ 5 │
└────────────────────────────┘
positionCaseInsensitiveUTF8¶
Finds the starting position of a substring within a UTF-8 encoded string, ignoring case differences.
Syntax¶
positionCaseInsensitiveUTF8(haystack, needle[, start_pos])
Arguments¶
haystack: UTF-8 encoded string in which the search is performed. String or Enum.needle: UTF-8 encoded substring to be searched. String.start_pos: Position (1-based) inhaystackat which the search starts. UInt. Optional.
Returns¶
The 1-based starting position of the needle if found. Returns 0 if the needle is not found. UInt64.
Example¶
SELECT positionCaseInsensitiveUTF8('Motörhead', 'HEAD')
Result:
┌─positionCaseInsensitiveUTF8('Motörhead', 'HEAD')─┐
│ 6 │
└──────────────────────────────────────────────────┘
multiSearchAllPositions¶
Searches for multiple substrings within a string and returns an array containing the 1-based starting position for each needle.
All multiSearch*() functions only support up to 2<sup>8</sup> needles.
Syntax¶
multiSearchAllPositions(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: String in which the search is performed. String.needle: An array of substrings to be searched. Array.
Returns¶
An array of UInt64 values. Each value represents the 1-based starting position of the corresponding needle in the haystack. A value of 0 indicates the needle was not found.
Example¶
SELECT multiSearchAllPositions('Hello, World!', ['hello', '!', 'world'])
Result:
┌─multiSearchAllPositions('Hello, World!', ['hello', '!', 'world'])─┐
│ [0,13,0] │
└───────────────────────────────────────────────────────────────────┘
multiSearchAllPositionsCaseInsensitive¶
Searches for multiple substrings within a string, ignoring case, and returns an array containing the 1-based starting position for each needle.
Syntax¶
multiSearchAllPositionsCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: String in which the search is performed. String.needle: An array of substrings to be searched. Array.
Returns¶
An array of UInt64 values. Each value represents the 1-based starting position of the corresponding needle in the haystack. A value of 0 indicates the needle was not found.
Example¶
SELECT multiSearchAllPositionsCaseInsensitive('Tinybird',['t','b'])
Result:
[1,5]
multiSearchAllPositionsUTF8¶
Searches for multiple UTF-8 encoded substrings within a UTF-8 encoded string and returns an array containing the 1-based starting position for each needle.
Syntax¶
multiSearchAllPositionsUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: UTF-8 encoded string in which the search is performed. String.needle: An array of UTF-8 encoded substrings to be searched. Array.
Returns¶
An array of UInt64 values. Each value represents the 1-based starting position of the corresponding needle in the haystack. A value of 0 indicates the needle was not found.
Example¶
Given Tinybird as a UTF-8 string, find the positions of T (\x54) and b (\x62).
SELECT multiSearchAllPositionsUTF8('\x54\x69\x6e\x79\x62\x69\x72\x64',['\x54','\x62'])
Result:
[1,5]
multiSearchAllPositionsCaseInsensitiveUTF8¶
Searches for multiple UTF-8 encoded substrings within a UTF-8 encoded string, ignoring case, and returns an array containing the 1-based starting position for each needle.
Syntax¶
multiSearchAllPositionsCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: UTF-8 encoded string in which the search is performed. String.needle: An array of UTF-8 encoded substrings to be searched. Array.
Returns¶
An array of UInt64 values. Each value represents the 1-based starting position of the corresponding needle in the haystack. A value of 0 indicates the needle was not found.
Example¶
Given Tinybird as a UTF-8 string, find the positions of t (\x74) and B (\x42).
SELECT multiSearchAllPositionsCaseInsensitiveUTF8('\x54\x69\x6e\x79\x62\x69\x72\x64',['\x74','\x42'])
Result:
[1,5]
multiSearchFirstPosition¶
Finds the leftmost starting position of any of the provided needle substrings within a haystack string.
Functions multiSearchFirstPositionCaseInsensitive, multiSearchFirstPositionUTF8 and multiSearchFirstPositionCaseInsensitiveUTF8 provide case-insensitive and/or UTF-8 variants of this function.
Syntax¶
multiSearchFirstPosition(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: String in which the search is performed. String.needle: An array of substrings to be searched. Array.
Returns¶
The 1-based starting position of the first matching needle found. Returns 0 if none of the needle substrings are found. UInt64.
Example¶
SELECT multiSearchFirstPosition('Hello World',['llo', 'Wor', 'ld'])
Result:
3
multiSearchFirstPositionCaseInsensitive¶
Finds the leftmost starting position of any of the provided needle substrings within a haystack string, ignoring case.
Syntax¶
multiSearchFirstPositionCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: String in which the search is performed. String.needle: An array of substrings to be searched. Array.
Returns¶
The 1-based starting position of the first matching needle found, ignoring case. Returns 0 if none of the needle substrings are found. UInt64.
Example¶
SELECT multiSearchFirstPositionCaseInsensitive('HELLO WORLD',['wor', 'ld', 'ello'])
Result:
2
multiSearchFirstPositionUTF8¶
Finds the leftmost starting position of any of the provided UTF-8 encoded needle substrings within a UTF-8 encoded haystack string.
Syntax¶
multiSearchFirstPositionUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: UTF-8 encoded string in which the search is performed. String.needle: An array of UTF-8 encoded substrings to be searched. Array.
Returns¶
The 1-based starting position of the first matching needle found. Returns 0 if none of the needle substrings are found. UInt64.
Example¶
Find the leftmost offset in UTF-8 string hello world which matches any of the given needles.
SELECT multiSearchFirstPositionUTF8('\x68\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64',['wor', 'ld', 'ello'])
Result:
2
multiSearchFirstPositionCaseInsensitiveUTF8¶
Finds the leftmost starting position of any of the provided UTF-8 encoded needle substrings within a UTF-8 encoded haystack string, ignoring case.
Syntax¶
multiSearchFirstPositionCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: UTF-8 encoded string in which the search is performed. String.needle: An array of UTF-8 encoded substrings to be searched. Array.
Returns¶
The 1-based starting position of the first matching needle found, ignoring case. Returns 0 if none of the needle substrings are found. UInt64.
Example¶
Find the leftmost offset in UTF-8 string HELLO WORLD which matches any of the given needles.
SELECT multiSearchFirstPositionCaseInsensitiveUTF8('\x48\x45\x4c\x4c\x4f\x20\x57\x4f\x52\x4c\x44',['wor', 'ld', 'ello'])
Result:
2
multiSearchFirstIndex¶
Returns the 1-based index of the first needle in the provided array that is found earliest in the haystack string.
Functions multiSearchFirstIndexCaseInsensitive, multiSearchFirstIndexUTF8 and multiSearchFirstIndexCaseInsensitiveUTF8 provide case-insensitive and/or UTF-8 variants of this function.
Syntax¶
multiSearchFirstIndex(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: String in which the search is performed. String.needle: An array of substrings to be searched. Array.
Returns¶
The 1-based index of the first needle from the array that matches. Returns 0 if no needle is found. UInt8.
Example¶
SELECT multiSearchFirstIndex('Hello World',['World','Hello'])
Result:
1
multiSearchFirstIndexCaseInsensitive¶
Returns the 1-based index of the first needle in the provided array that is found earliest in the haystack string, ignoring case.
Syntax¶
multiSearchFirstIndexCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: String in which the search is performed. String.needle: An array of substrings to be searched. Array.
Returns¶
The 1-based index of the first needle from the array that matches, ignoring case. Returns 0 if no needle is found. UInt8.
Example¶
SELECT multiSearchFirstIndexCaseInsensitive('hElLo WoRlD',['World','Hello'])
Result:
1
multiSearchFirstIndexUTF8¶
Returns the 1-based index of the first UTF-8 encoded needle in the provided array that is found earliest in the UTF-8 encoded haystack string.
Syntax¶
multiSearchFirstIndexUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: UTF-8 encoded string in which the search is performed. String.needle: An array of UTF-8 encoded substrings to be searched. Array.
Returns¶
The 1-based index of the first needle from the array that matches. Returns 0 if no needle is found. UInt8.
Example¶
Given Hello World as a UTF-8 string, find the first index of UTF-8 strings Hello and World.
SELECT multiSearchFirstIndexUTF8('\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64',['\x57\x6f\x72\x6c\x64','\x48\x65\x6c\x6c\x6f'])
Result:
1
multiSearchFirstIndexCaseInsensitiveUTF8¶
Returns the 1-based index of the first UTF-8 encoded needle in the provided array that is found earliest in the UTF-8 encoded haystack string, ignoring case.
Syntax¶
multiSearchFirstIndexCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: UTF-8 encoded string in which the search is performed. String.needle: An array of UTF-8 encoded substrings to be searched. Array.
Returns¶
The 1-based index of the first needle from the array that matches, ignoring case. Returns 0 if no needle is found. UInt8.
Example¶
Given HELLO WORLD as a UTF-8 string, find the first index of UTF-8 strings hello and world.
SELECT multiSearchFirstIndexCaseInsensitiveUTF8('\x48\x45\x4c\x4c\x4f\x20\x57\x4f\x52\x4c\x44',['\x68\x65\x6c\x6c\x6f','\x77\x6f\x72\x6c\x64'])
Result:
1
multiSearchAny¶
Checks if any of the provided needle substrings exist within the haystack string.
Functions multiSearchAnyCaseInsensitive, multiSearchAnyUTF8 and multiSearchAnyCaseInsensitiveUTF8 provide case-insensitive and/or UTF-8 variants of this function.
Syntax¶
multiSearchAny(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: String in which the search is performed. String.needle: An array of substrings to be searched. Array.
Returns¶
Returns 1 if at least one needle is found in the haystack, otherwise returns 0. UInt8.
Example¶
SELECT multiSearchAny('Tinybird',['C','H'])
Result:
0
multiSearchAnyCaseInsensitive¶
Checks if any of the provided needle substrings exist within the haystack string, ignoring case.
Syntax¶
multiSearchAnyCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: String in which the search is performed. String.needle: An array of substrings to be searched. Array.
Returns¶
Returns 1 if at least one needle is found in the haystack (case-insensitive), otherwise returns 0. UInt8.
Example¶
SELECT multiSearchAnyCaseInsensitive('Tinybird',['t','b'])
Result:
1
multiSearchAnyUTF8¶
Checks if any of the provided UTF-8 encoded needle substrings exist within the UTF-8 encoded haystack string.
Syntax¶
multiSearchAnyUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: UTF-8 encoded string in which the search is performed. String.needle: An array of UTF-8 encoded substrings to be searched. Array.
Returns¶
Returns 1 if at least one needle is found in the haystack, otherwise returns 0. UInt8.
Example¶
Given Tinybird as a UTF-8 string, check if there are any T (\x54) or b (\x62) letters in the word.
SELECT multiSearchAnyUTF8('\x54\x69\x6e\x79\x62\x69\x72\x64',['\x54','\x62'])
Result:
1
multiSearchAnyCaseInsensitiveUTF8¶
Checks if any of the provided UTF-8 encoded needle substrings exist within the UTF-8 encoded haystack string, ignoring case.
Syntax¶
multiSearchAnyCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
Arguments¶
haystack: UTF-8 encoded string in which the search is performed. String.needle: An array of UTF-8 encoded substrings to be searched. Array.
Returns¶
Returns 1 if at least one needle is found in the haystack (case-insensitive), otherwise returns 0. UInt8.
Example¶
Given Tinybird as a UTF-8 string, check if there is any letter y(\x79) in the word, ignoring case.
SELECT multiSearchAnyCaseInsensitiveUTF8('\x54\x69\x6e\x79\x62\x69\x72\x64',['\x79'])
Result:
1
match¶
Determines if a string matches a given regular expression pattern.
Syntax¶
match(haystack, pattern)
Alias: haystack REGEXP pattern operator
Arguments¶
haystack: The string to search within. String.pattern: The regular expression pattern to match, using re2 regular expression syntax. String.
Returns¶
Returns 1 if the haystack matches the pattern, otherwise returns 0. UInt8.
Example¶
SELECT match('Tinybird is great', '.*great')
Result:
┌─match('Tinybird is great', '.*great')─┐
│ 1 │
└───────────────────────────────────────┘
multiMatchAny¶
Checks if a string matches any of the provided regular expression patterns.
Functions in the multi[Fuzzy]Match*() family use the Vectorscan library. As such, they are only enabled if ClickHouse is compiled with support for Vectorscan.
To turn off all functions that use Hyperscan, use setting SET allow_hyperscan = 0;.
Due to restrictions of Vectorscan, the length of the haystack string must be less than 2<sup>32</sup> bytes.
Hyperscan is generally vulnerable to regular expression denial of service (ReDoS) attacks. Users are advised to check the provided patterns carefully.
Syntax¶
multiMatchAny(haystack, [pattern1, pattern2, ..., patternN])
Arguments¶
haystack: The string to search within. String.pattern: An array of regular expression patterns to match, using re2 regular expression syntax. Array.
Returns¶
Returns 1 if the haystack matches at least one of the patterns, otherwise returns 0. UInt8.
Example¶
SELECT multiMatchAny('Tinybird is fast', ['slow', 'fast'])
Result:
┌─multiMatchAny('Tinybird is fast', ['slow', 'fast'])─┐
│ 1 │
└─────────────────────────────────────────────────────┘
multiMatchAnyIndex¶
Returns the 1-based index of the first regular expression pattern in the provided array that matches the haystack string.
Syntax¶
multiMatchAnyIndex(haystack, [pattern1, pattern2, ..., patternN])
Arguments¶
haystack: The string to search within. String.pattern: An array of regular expression patterns to match, using re2 regular expression syntax. Array.
Returns¶
The 1-based index of the first matching pattern from the array. Returns 0 if no pattern matches. UInt8.
Example¶
SELECT multiMatchAnyIndex('Tinybird is awesome', ['bad', 'awesome', 'slow'])
Result:
┌─multiMatchAnyIndex('Tinybird is awesome', ['bad', 'awesome', 'slow'])─┐
│ 2 │
└───────────────────────────────────────────────────────────────────────┘
multiMatchAllIndices¶
Returns an array of all 1-based indices of the regular expression patterns that match the haystack string.
Syntax¶
multiMatchAllIndices(haystack, [pattern1, pattern2, ..., patternN])
Arguments¶
haystack: The string to search within. String.pattern: An array of regular expression patterns to match, using re2 regular expression syntax. Array.
Returns¶
An array of UInt8 values, representing the 1-based indices of all matching patterns. The order of indices is not guaranteed. Array(UInt8).
Example¶
SELECT multiMatchAllIndices('Tinybird is fast and scalable', ['fast', 'slow', 'scalable'])
Result:
┌─multiMatchAllIndices('Tinybird is fast and scalable', ['fast', 'slow', 'scalable'])─┐
│ [1,3] │
└─────────────────────────────────────────────────────────────────────────────────────┘
multiFuzzyMatchAny¶
Checks if a string matches any of the provided regular expression patterns within a specified edit distance.
multiFuzzyMatch*() function family do not support UTF-8 regular expressions (it treats them as a sequence of bytes) due to restrictions of Hyperscan.
Syntax¶
multiFuzzyMatchAny(haystack, distance, [pattern1, pattern2, ..., patternN])
Arguments¶
haystack: The string to search within. String.distance: The maximum allowed edit distance for a match. UInt.pattern: An array of regular expression patterns to match. Array.
Returns¶
Returns 1 if the haystack matches at least one pattern within the given distance, otherwise returns 0. UInt8.
Example¶
SELECT multiFuzzyMatchAny('Tinybird', 1, ['Tynybird', 'Tinybirdz'])
Result:
┌─multiFuzzyMatchAny('Tinybird', 1, ['Tynybird', 'Tinybirdz'])─┐
│ 1 │
└──────────────────────────────────────────────────────────────┘
multiFuzzyMatchAnyIndex¶
Returns the 1-based index of the first regular expression pattern in the provided array that matches the haystack string within a specified edit distance.
Syntax¶
multiFuzzyMatchAnyIndex(haystack, distance, [pattern1, pattern2, ..., patternN])
Arguments¶
haystack: The string to search within. String.distance: The maximum allowed edit distance for a match. UInt.pattern: An array of regular expression patterns to match. Array.
Returns¶
The 1-based index of the first matching pattern from the array within the given distance. Returns 0 if no pattern matches. UInt8.
Example¶
SELECT multiFuzzyMatchAnyIndex('Tinybird', 1, ['Tynybird', 'Tinybirdz'])
Result:
┌─multiFuzzyMatchAnyIndex('Tinybird', 1, ['Tynybird', 'Tinybirdz'])─┐
│ 1 │
└───────────────────────────────────────────────────────────────────┘
multiFuzzyMatchAllIndices¶
Returns an array of all 1-based indices of the regular expression patterns that match the haystack string within a specified edit distance.
Syntax¶
multiFuzzyMatchAllIndices(haystack, distance, [pattern1, pattern2, ..., patternN])
Arguments¶
haystack: The string to search within. String.distance: The maximum allowed edit distance for a match. UInt.pattern: An array of regular expression patterns to match. Array.
Returns¶
An array of UInt8 values, representing the 1-based indices of all matching patterns within the given distance. The order of indices is not guaranteed. Array(UInt8).
Example¶
SELECT multiFuzzyMatchAllIndices('Tinybird', 1, ['Tynybird', 'Tinybirdz', 'Tinybird'])
Result:
┌─multiFuzzyMatchAllIndices('Tinybird', 1, ['Tynybird', 'Tinybirdz', 'Tinybird'])─┐
│ [1,2,3] │
└─────────────────────────────────────────────────────────────────────────────────┘
extract¶
Extracts the first substring from haystack that matches the provided regular expression pattern.
Syntax¶
extract(haystack, pattern)
Arguments¶
haystack: The input string to search. String.pattern: The regular expression pattern to match, using re2 regular expression syntax. String.
Returns¶
The first substring that matches the pattern. If the pattern contains capturing groups, it returns the match for the first capturing group. Returns an empty string if no match is found. String.
Example¶
SELECT extract('number: 1, number: 2, number: 3', '\\d+') AS result
Result:
┌─result─┐ │ 1 │ └────────┘
extractAll¶
Extracts all substrings from haystack that match the provided regular expression pattern.
Syntax¶
extractAll(haystack, pattern)
Arguments¶
haystack: The input string to search. String.pattern: The regular expression pattern to match, using re2 regular expression syntax. String.
Returns¶
An array of all substrings that match the pattern. If the pattern contains capturing groups, it returns matches for the first capturing group. Returns an empty array if no matches are found. Array(String).
Example¶
SELECT extractAll('number: 1, number: 2, number: 3', '\\d+') AS result
Result:
┌─result────────┐ │ ['1','2','3'] │ └───────────────┘
extractAllGroupsHorizontal¶
Extracts all matches for each capturing group in a regular expression, returning an array of arrays where each inner array contains all matches for a specific group.
This function is slower than extractAllGroupsVertical.
Syntax¶
extractAllGroupsHorizontal(haystack, pattern)
Arguments¶
haystack: The input string to search. String.pattern: The regular expression pattern with capturing groups, using re2 regular expression syntax. String.
Returns¶
An array of arrays of strings. The first inner array contains all matches for the first capturing group, the second for the second, and so on. Array(Array(String)).
If haystack doesn't match the pattern regex, an array of empty arrays is returned.
Example¶
SELECT extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')
Result:
┌─extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','def','ghi'],['111','222','333']] │
└──────────────────────────────────────────────────────────────────────────────────────────┘
extractGroups¶
Extracts the first set of matches for all capturing groups in a regular expression.
Syntax¶
extractGroups(haystack, pattern)
Arguments¶
haystack: The input string to search. String.pattern: The regular expression pattern with capturing groups, using re2 regular expression syntax. String.
Returns¶
An array of strings, where each element corresponds to the first match of a capturing group. Array(String).
Example¶
SELECT extractGroups('hello abc=111 world', '("[^"]+"|\\w+)=("[^"]+"|\\w+)') AS result
Result:
┌─result────────┐ │ ['abc','111'] │ └───────────────┘
extractAllGroupsVertical¶
Extracts all matches for all capturing groups in a regular expression, returning an array of arrays where each inner array represents a complete match of all groups.
Syntax¶
extractAllGroupsVertical(haystack, pattern)
Arguments¶
haystack: The input string to search. String.pattern: The regular expression pattern with capturing groups, using re2 regular expression syntax. String.
Returns¶
An array of arrays of strings. Each inner array contains the matches for all capturing groups from a single overall match of the pattern. Array(Array(String)).
If haystack doesn't match the pattern regex, an empty array is returned.
Example¶
SELECT extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')
Result:
┌─extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','111'],['def','222'],['ghi','333']] │
└────────────────────────────────────────────────────────────────────────────────────────┘
like¶
Checks if a string matches a pattern using SQL LIKE syntax, which supports % for any sequence of characters and _ for any single character.
Syntax¶
like(haystack, pattern)
Alias: haystack LIKE pattern (operator)
Arguments¶
haystack: The string to search within. String.pattern: TheLIKEpattern to match. String.
Returns¶
Returns 1 if the haystack matches the pattern, otherwise returns 0. UInt8.
Example¶
SELECT like('Tinybird', 'Tiny%bird')
Result:
┌─like('Tinybird', 'Tiny%bird')─┐
│ 1 │
└───────────────────────────────┘
notLike¶
Checks if a string does NOT match a pattern using SQL LIKE syntax. This is the inverse of the like function.
Syntax¶
notLike(haystack, pattern)
Alias: haystack NOT LIKE pattern (operator)
Arguments¶
haystack: The string to search within. String.pattern: TheLIKEpattern to match. String.
Returns¶
Returns 1 if the haystack does not match the pattern, otherwise returns 0. UInt8.
Example¶
SELECT notLike('Tinybird', 'Tiny%house')
Result:
┌─notLike('Tinybird', 'Tiny%house')─┐
│ 1 │
└───────────────────────────────────┘
ilike¶
Checks if a string matches a pattern using SQL LIKE syntax, ignoring case differences.
Syntax¶
ilike(haystack, pattern)
Alias: haystack ILIKE pattern (operator)
Arguments¶
haystack: The string to search within. String.pattern: TheLIKEpattern to match. String.
Returns¶
Returns 1 if the haystack matches the pattern (case-insensitive), otherwise returns 0. UInt8.
Example¶
SELECT ilike('Tinybird', 'tiny%bird')
Result:
┌─ilike('Tinybird', 'tiny%bird')─┐
│ 1 │
└────────────────────────────────┘
notILike¶
Checks if a string does NOT match a pattern using SQL LIKE syntax, ignoring case differences. This is the inverse of the ilike function.
Syntax¶
notILike(haystack, pattern)
Alias: haystack NOT ILIKE pattern (operator)
Arguments¶
haystack: The string to search within. String.pattern: TheLIKEpattern to match. String.
Returns¶
Returns 1 if the haystack does not match the pattern (case-insensitive), otherwise returns 0. UInt8.
Example¶
SELECT notILike('Tinybird', 'tiny%house')
Result:
┌─notILike('Tinybird', 'tiny%house')─┐
│ 1 │
└────────────────────────────────────┘
ngramDistance¶
Calculates the 4-gram distance between two strings, indicating their similarity. A value closer to 0 means higher similarity, while a value closer to 1 means lower similarity.
Functions ngramDistanceCaseInsensitive, ngramDistanceUTF8, ngramDistanceCaseInsensitiveUTF8 provide case-insensitive and/or UTF-8 variants of this function.
Syntax¶
ngramDistance(haystack, needle)
Arguments¶
haystack: The first string for comparison. String.needle: The second string for comparison. String.
Returns¶
A Float32 value between 0 and 1, representing the normalized 4-gram distance. Float32.
Example¶
The more similar two strings are to each other, the closer the result will be to 0 (identical).
SELECT ngramDistance('Tinybird','Tinybird!')
Result:
0.06666667
The less similar two strings are to each, the larger the result will be.
SELECT ngramDistance('Tinybird','House')
Result:
0.5555556
ngramDistanceCaseInsensitive¶
Calculates the 4-gram distance between two strings, ignoring case, to indicate their similarity.
Syntax¶
ngramDistanceCaseInsensitive(haystack, needle)
Arguments¶
haystack: The first string for comparison. String.needle: The second string for comparison. String.
Returns¶
A Float32 value between 0 and 1, representing the normalized 4-gram distance, ignoring case. Float32.
Example¶
With ngramDistance differences in case will affect the similarity value:
SELECT ngramDistance('Tinybird','tinybird')
Result:
0.71428573
With ngramDistanceCaseInsensitive case is ignored so two identical strings differing only in case will now return a low similarity value:
SELECT ngramDistanceCaseInsensitive('Tinybird','tinybird')
Result:
0
ngramDistanceUTF8¶
Calculates the 4-gram distance between two UTF-8 encoded strings, indicating their similarity.
Syntax¶
ngramDistanceUTF8(haystack, needle)
Arguments¶
haystack: The first UTF-8 encoded string for comparison. String.needle: The second UTF-8 encoded string for comparison. String.
Returns¶
A Float32 value between 0 and 1, representing the normalized 4-gram distance for UTF-8 strings. Float32.
Example¶
SELECT ngramDistanceUTF8('abcde','cde')
Result:
0.5
ngramDistanceCaseInsensitiveUTF8¶
Calculates the 4-gram distance between two UTF-8 encoded strings, ignoring case, to indicate their similarity.
Syntax¶
ngramDistanceCaseInsensitiveUTF8(haystack, needle)
Arguments¶
haystack: The first UTF-8 encoded string for comparison. String.needle: The second UTF-8 encoded string for comparison. String.
Returns¶
A Float32 value between 0 and 1, representing the normalized 4-gram distance for UTF-8 strings, ignoring case. Float32.
Example¶
SELECT ngramDistanceCaseInsensitiveUTF8('abcde','CDE')
Result:
0.5
ngramSearch¶
Calculates a non-symmetric n-gram similarity score between two strings, useful for fuzzy string searching. A higher value (closer to 1) suggests a higher likelihood of needle being contained within haystack.
Functions ngramSearchCaseInsensitive, ngramSearchUTF8, ngramSearchCaseInsensitiveUTF8 provide case-insensitive and/or UTF-8 variants of this function.
Syntax¶
ngramSearch(haystack, needle)
Arguments¶
haystack: The string to search within. String.needle: The string to search for. String.
Returns¶
A Float32 value between 0 and 1, representing the likelihood of needle being in haystack. Float32.
Example¶
SELECT ngramSearch('Hello World','World Hello')
Result:
0.5
ngramSearchCaseInsensitive¶
Calculates a non-symmetric n-gram similarity score between two strings, ignoring case, for fuzzy string searching.
Syntax¶
ngramSearchCaseInsensitive(haystack, needle)
Arguments¶
haystack: The string to search within. String.needle: The string to search for. String.
Returns¶
A Float32 value between 0 and 1, representing the likelihood of needle being in haystack, ignoring case. Float32.
Example¶
SELECT ngramSearchCaseInsensitive('Hello World','hello')
Result:
1
ngramSearchUTF8¶
Calculates a non-symmetric n-gram similarity score between two UTF-8 encoded strings for fuzzy string searching.
Syntax¶
ngramSearchUTF8(haystack, needle)
Arguments¶
haystack: The UTF-8 encoded string to search within. String.needle: The UTF-8 encoded string to search for. String.
Returns¶
A Float32 value between 0 and 1, representing the likelihood of needle being in haystack (UTF-8). Float32.
Example¶
SELECT ngramSearchUTF8('абвгдеёжз', 'гдеёзд')
Result:
0.5
ngramSearchCaseInsensitiveUTF8¶
Calculates a non-symmetric n-gram similarity score between two UTF-8 encoded strings, ignoring case, for fuzzy string searching.
Syntax¶
ngramSearchCaseInsensitiveUTF8(haystack, needle)
Arguments¶
haystack: The UTF-8 encoded string to search within. String.needle: The UTF-8 encoded string to search for. String.
Returns¶
A Float32 value between 0 and 1, representing the likelihood of needle being in haystack (UTF-8, case-insensitive). Float32.
Example¶
SELECT ngramSearchCaseInsensitiveUTF8('абвГДЕёжз', 'АбвгдЕЁжз')
Result:
0.57142854
countSubstrings¶
Counts the number of non-overlapping occurrences of a substring within a string.
Functions countSubstringsCaseInsensitive and countSubstringsCaseInsensitiveUTF8 provide case-insensitive and case-insensitive + UTF-8 variants of this function respectively.
Syntax¶
countSubstrings(haystack, needle[, start_pos])
Arguments¶
haystack: String in which the search is performed. String or Enum.needle: Substring to be counted. String.start_pos: Position (1-based) inhaystackat which the search starts. UInt. Optional.
Returns¶
The total number of times the needle substring appears in the haystack. UInt64.
Example¶
SELECT countSubstrings('aaaa', 'aa')
Result:
┌─countSubstrings('aaaa', 'aa')─┐
│ 2 │
└───────────────────────────────┘
Example with start_pos argument:
SELECT countSubstrings('abc___abc', 'abc', 4)
Result:
┌─countSubstrings('abc___abc', 'abc', 4)─┐
│ 1 │
└────────────────────────────────────────┘
countSubstringsCaseInsensitive¶
Counts the number of non-overlapping occurrences of a substring within a string, ignoring case.
Syntax¶
countSubstringsCaseInsensitive(haystack, needle[, start_pos])
Arguments¶
haystack: String in which the search is performed. String or Enum.needle: Substring to be counted. String.start_pos: Position (1-based) inhaystackat which the search starts. UInt. Optional.
Returns¶
The total number of times the needle substring appears in the haystack, ignoring case. UInt64.
Example¶
SELECT countSubstringsCaseInsensitive('AAAA', 'aa')
Result:
┌─countSubstringsCaseInsensitive('AAAA', 'aa')─┐
│ 2 │
└──────────────────────────────────────────────┘
Example with start_pos argument:
SELECT countSubstringsCaseInsensitive('abc___ABC___abc', 'abc', 4)
Result:
┌─countSubstringsCaseInsensitive('abc___ABC___abc', 'abc', 4)─┐
│ 2 │
└─────────────────────────────────────────────────────────────┘
countSubstringsCaseInsensitiveUTF8¶
Counts the number of non-overlapping occurrences of a UTF-8 encoded substring within a UTF-8 encoded string, ignoring case.
Syntax¶
countSubstringsCaseInsensitiveUTF8(haystack, needle[, start_pos])
Arguments¶
haystack: UTF-8 encoded string in which the search is performed. String or Enum.needle: UTF-8 encoded substring to be counted. String.start_pos: Position (1-based) inhaystackat which the search starts. UInt. Optional.
Returns¶
The total number of times the needle substring appears in the haystack (UTF-8, case-insensitive). UInt64.
Example¶
SELECT countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'КА')
Result:
┌─countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'КА')─┐
│ 4 │
└────────────────────────────────────────────────────────────────────┘
Example with start_pos argument:
SELECT countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'КА', 13)
Result:
┌─countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'КА', 13)─┐
│ 2 │
└────────────────────────────────────────────────────────────────────────┘
countMatches¶
Counts the number of times a regular expression pattern matches within a string.
Syntax¶
countMatches(haystack, pattern)
Arguments¶
haystack: The string to search within. String.pattern: The regular expression pattern to match, using re2 regular expression syntax. String.
Returns¶
The total number of matches found for the pattern in the haystack. UInt64.
Example¶
SELECT countMatches('foobar.com', 'o+')
Result:
┌─countMatches('foobar.com', 'o+')─┐
│ 2 │
└──────────────────────────────────┘
SELECT countMatches('aaaa', 'aa')
Result:
┌─countMatches('aaaa', 'aa')────┐
│ 2 │
└───────────────────────────────┘
countMatchesCaseInsensitive¶
Counts the number of times a regular expression pattern matches within a string, ignoring case.
Syntax¶
countMatchesCaseInsensitive(haystack, pattern)
Arguments¶
haystack: The string to search within. String.pattern: The regular expression pattern to match, using re2 regular expression syntax. String.
Returns¶
The total number of matches found for the pattern in the haystack, ignoring case. UInt64.
Example¶
SELECT countMatchesCaseInsensitive('AAAA', 'aa')
Result:
┌─countMatchesCaseInsensitive('AAAA', 'aa')────┐
│ 2 │
└──────────────────────────────────────────────┘
regexpExtract¶
Extracts a specific capturing group from the first match of a regular expression in a string.
Syntax¶
regexpExtract(haystack, pattern[, index])
Alias: REGEXP_EXTRACT(haystack, pattern[, index]).
Arguments¶
haystack: The string to search within. String.pattern: The regular expression pattern with capturing groups, using re2 regular expression syntax. String.index: An integer (0-based) specifying which capturing group to extract. 0 extracts the entire match. Defaults to 1. UInt or Int. Optional.
Returns¶
The extracted substring corresponding to the specified capturing group. String.
Example¶
SELECT
regexpExtract('100-200', '(\\d+)-(\\d+)', 1),
regexpExtract('100-200', '(\\d+)-(\\d+)', 2),
regexpExtract('100-200', '(\\d+)-(\\d+)', 0),
regexpExtract('100-200', '(\\d+)-(\\d+)')
Result:
┌─regexpExtract('100-200', '(\\d+)-(\\d+)', 1)─┬─regexpExtract('100-200', '(\\d+)-(\\d+)', 2)─┬─regexpExtract('100-200', '(\\d+)-(\\d+)', 0)─┬─regexpExtract('100-200', '(\\d+)-(\\d+)')─┐
│ 100 │ 200 │ 100-200 │ 100 │
└──────────────────────────────────────────────┴──────────────────────────────────────────────┴──────────────────────────────────────────────┴───────────────────────────────────────────┘
hasSubsequence¶
Checks if one string is a subsequence of another. A subsequence can be derived by deleting zero or more characters from the original string without changing the order of the remaining characters.
Syntax¶
hasSubsequence(haystack, needle)
Arguments¶
haystack: The string to search within. String.needle: The subsequence to search for. String.
Returns¶
Returns 1 if needle is a subsequence of haystack, otherwise returns 0. UInt8.
Example¶
SELECT hasSubsequence('garbage', 'arg')
Result:
┌─hasSubsequence('garbage', 'arg')─┐
│ 1 │
└──────────────────────────────────┘
hasSubsequenceCaseInsensitive¶
Checks if one string is a subsequence of another, ignoring case differences.
Syntax¶
hasSubsequenceCaseInsensitive(haystack, needle)
Arguments¶
haystack: The string to search within. String.needle: The subsequence to search for. String.
Returns¶
Returns 1 if needle is a subsequence of haystack (case-insensitive), otherwise returns 0. UInt8.
Example¶
SELECT hasSubsequenceCaseInsensitive('garbage', 'ARG')
Result:
┌─hasSubsequenceCaseInsensitive('garbage', 'ARG')─┐
│ 1 │
└─────────────────────────────────────────────────┘
hasSubsequenceUTF8¶
Checks if one UTF-8 encoded string is a subsequence of another UTF-8 encoded string.
Syntax¶
hasSubsequenceUTF8(haystack, needle)
Arguments¶
haystack: The UTF-8 encoded string to search within. String.needle: The UTF-8 encoded subsequence to search for. String.
Returns¶
Returns 1 if needle is a subsequence of haystack (UTF-8), otherwise returns 0. UInt8.
Example¶
SELECT hasSubsequenceUTF8('ClickHouse - столбцовая система управления базами данных', 'система')
Result:
┌─hasSubsequenceUTF8('ClickHouse - столбцовая система управления базами данных', 'система')─┐
│ 1 │
└───────────────────────────────────────────────────────────────────────────────────────────┘
hasSubsequenceCaseInsensitiveUTF8¶
Checks if one UTF-8 encoded string is a subsequence of another UTF-8 encoded string, ignoring case.
Syntax¶
hasSubsequenceCaseInsensitiveUTF8(haystack, needle)
Arguments¶
haystack: The UTF-8 encoded string to search within. String.needle: The UTF-8 encoded subsequence to search for. String.
Returns¶
Returns 1 if needle is a subsequence of haystack (UTF-8, case-insensitive), otherwise returns 0. UInt8.
Example¶
SELECT hasSubsequenceCaseInsensitiveUTF8('ClickHouse - столбцовая система управления базами данных', 'СИСТЕМА')
Result:
┌─hasSubsequenceCaseInsensitiveUTF8('ClickHouse - столбцовая система управления базами данных', 'СИСТЕМА')─┐
│ 1 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘
hasToken¶
Checks if a specific token (a maximal length substring between non-alphanumeric ASCII characters or string boundaries) is present in a string.
Syntax¶
hasToken(haystack, token)
Arguments¶
haystack: String in which the search is performed. String or Enum.token: The token to search for. String.
Returns¶
Returns 1 if the token is found in the haystack, otherwise returns 0. UInt8.
Example¶
SELECT hasToken('Hello World','Hello')
Result:
1
hasTokenOrNull¶
Checks if a specific token is present in a string, returning 1 if found, 0 if not found, or null if the token is ill-formed.
Syntax¶
hasTokenOrNull(haystack, token)
Arguments¶
haystack: String in which the search is performed. String or Enum.token: The token to search for. String.
Returns¶
Returns 1 if the token is present, 0 if not present, or null if the token is ill-formed. Nullable(UInt8).
Example¶
Where hasToken would throw an error for an ill-formed token, hasTokenOrNull returns null for an ill-formed token.
SELECT hasTokenOrNull('Hello World','Hello,World')
Result:
null
hasTokenCaseInsensitive¶
Checks if a specific token is present in a string, ignoring case differences.
Syntax¶
hasTokenCaseInsensitive(haystack, token)
Arguments¶
haystack: String in which the search is performed. String or Enum.token: The token to search for. String.
Returns¶
Returns 1 if the token is found in the haystack (case-insensitive), otherwise returns 0. UInt8.
Example¶
SELECT hasTokenCaseInsensitive('Hello World','hello')
Result:
1
hasTokenCaseInsensitiveOrNull¶
Checks if a specific token is present in a string, ignoring case, returning 1 if found, 0 if not found, or null if the token is ill-formed.
Syntax¶
hasTokenCaseInsensitiveOrNull(haystack, token)
Arguments¶
haystack: String in which the search is performed. String or Enum.token: The token to search for. String.
Returns¶
Returns 1 if the token is present (case-insensitive), 0 if not present, or null if the token is ill-formed. Nullable(UInt8).
Example¶
Where hasTokenCaseInsensitive would throw an error for an ill-formed token, hasTokenCaseInsensitiveOrNull returns null for an ill-formed token.
SELECT hasTokenCaseInsensitiveOrNull('Hello World','hello,world')
Result:
null