Regular expression functions¶

All of the regular expression functions use the Java pattern syntax, with a few notable exceptions:

Feature	Description
Multi-line mode	Only `\n` is recognized as a line terminator when using multi-line mode (`(?m)` flag). The `(?d)` flag is not supported and should not be used.
Case-insensitive matching	Case-insensitive matching (`(?i)` flag) is always performed in a Unicode-aware manner. Context-sensitive and local-sensitive matching is not supported. The `(?u)` flag is not supported.
Surrogate pairs	Surrogate pairs are not supported. For example, `\uD800\uDC00` must be specified as `\x{10000}`.
Boundaries	Boundaries (`\b`) are incorrectly handled for a non-spacing mark without a base character.
`\Q` and `\E` in character classes	`\Q` and `\E` are treated as literals and not supported in character classes (such as `[A-Z123]`).
Unicode character classes	- Underscores in names must be removed. Example: `OldItalic` instead of `Old_Italic`. - Scripts must be specified directly without prefixes. Example: `Hiragana`.
Blocks	Blocks must be specified with the `In` prefix. Example: `\p{Mongolian}`. The `block=` and `blk=` prefixes are not supported.
Categories	Categories must be specified directly without prefixes. Example: `\p{L}`.
Binary properties	Binary properties must be specified directly without prefixes. Example: `\p{NoncharacterCodePoint}`.

`regex_count()`¶

Function	Description	Return Value
`regexp_count(string, pattern) → bigint`	Returns the number of occurrences of `pattern` in `string`.	bigint
Example:
`SELECT regexp_count('1a 2b 14m', '\s[a-z]+\s'); -- 3`

`regexp_extract_all`¶

Function	Description	Return Value
`regexp_extract_all(string, pattern)`	Returns the substring(s) matched by the regular expression `pattern` in `string`.	Array of substrings
`regexp_extract_all(string, pattern, group)`	Finds all occurrences of the regular expression `pattern` in `string` and returns the capturing group number.	Array of substrings (group)

Example:

SELECT regexp_extract_all('1a 2b 14m', '\d+');
-- [1, 2, 14]

SELECT regexp_extract_all('1a 2b 14m', '(\d+)([a-z]+)', 2); 
-- ['a', 'b', 'm']

`regexp_extract()`¶

Function	Description	Return Value
`regexp_extract(string, pattern)`	Returns the first substring matched by the regular expression `pattern` in `string`.	varchar
`regexp_extract(string, pattern, group)`	Finds the first occurrence of the regular expression `pattern` in `string` and returns the capturing group number `group`.	varchar

Example:

SELECT regexp_extract('1a 2b 14m', '\d+'); 
-- 1
SELECT regexp_extract('1a 2b 14m', '(\d+)([a-z]+)', 2); 
-- 'a'

`regexp_like()`¶

Function	Description	Return Value
`regexp_like(string, pattern)`	Evaluates the regular expression `pattern` and determines if it is contained within `string`.	boolean

Example:

SELECT regexp_like('1a 2b 14m', '\d+b'); -- true

`regexp_position()`¶

Function	Description	Return Value
`regexp_position(string, pattern)`	Returns the index of the first occurrence (counting from 1) of `pattern` in `string`. Returns -1 if not found.	integer
`regexp_position(string, pattern, start)`	Returns the index of the first occurrence of `pattern` in `string`, starting from `start` (inclusive). Returns -1 if not found.	integer
`regexp_position(string, pattern, start, occurrence)`	Returns the index of the nth occurrence of `pattern` in `string`, starting from `start` (inclusive). Returns -1 if not found.	integer

Example:

SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b'); 
-- 8
SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12); 
-- 19

`regexp_replace()`¶

Function	Description	Return Value
`regexp_replace(string, pattern)`	Removes every instance of the substring matched by the regular expression `pattern` from `string`.	varchar
`regexp_replace(string, pattern, replacement)`	Replaces every instance of the substring matched by the regular expression `pattern` in `string` with `replacement`.	varchar
`regexp_replace(string, pattern, function)`	Replaces every instance of the substring matched by the regular expression `pattern` in `string` using `function`.	varchar

Example:

SELECT regexp_replace('1a 2b 14m', '\d+[ab] '); 
-- '14m'
SELECT regexp_replace('1a 2b 14m', '(\d+)([ab]) ', '3c$2 '); 
-- '3ca 3cb 14m'
SELECT regexp_replace('new york', '(\w)(\w*)', x -> upper(x[1]) || lower(x[2])); 
--'New York'

`regexp_split()`¶

Function	Description	Return Value
`regexp_split(string, pattern)`	Splits `string` using the regular expression `pattern` and returns an array. Trailing empty strings are preserved.	Array
Example:
`SELECT regexp_split('1a 2b 14m', '\s[a-z]+\s'); -- [1, 2, 14, ]`

Regular expression functions¶

regex_count()¶

regexp_extract_all¶

regexp_extract()¶

regexp_like()¶

regexp_position()¶

regexp_replace()¶

regexp_split()¶

`regex_count()`¶

`regexp_extract_all`¶

`regexp_extract()`¶

`regexp_like()`¶

`regexp_position()`¶

`regexp_replace()`¶

`regexp_split()`¶