Regular expression functions¶
All of the regular expression functions use the Java pattern syntax, with a few notable exceptions:
Feature | Description |
---|---|
Multi-line mode | Only \n is recognized as a line terminator when using multi-line mode ((?m) flag). The (?d) flag is not supported and should not be used. |
Case-insensitive matching | Case-insensitive matching ((?i) flag) is always performed in a Unicode-aware manner. Context-sensitive and local-sensitive matching is not supported. The (?u) flag is not supported. |
Surrogate pairs | Surrogate pairs are not supported. For example, \uD800\uDC00 must be specified as \x{10000} . |
Boundaries | Boundaries (\b ) are incorrectly handled for a non-spacing mark without a base character. |
\Q and \E in character classes |
\Q and \E are treated as literals and not supported in character classes (such as [A-Z123] ). |
Unicode character classes | - Underscores in names must be removed. Example: OldItalic instead of Old_Italic . - Scripts must be specified directly without prefixes. Example: Hiragana . |
Blocks | Blocks must be specified with the In prefix. Example: \p{Mongolian} . The block= and blk= prefixes are not supported. |
Categories | Categories must be specified directly without prefixes. Example: \p{L} . |
Binary properties | Binary properties must be specified directly without prefixes. Example: \p{NoncharacterCodePoint} . |
regex_count()
¶
regexp_extract_all
¶
Function | Description | Return Value |
---|---|---|
regexp_extract_all(string, pattern) |
Returns the substring(s) matched by the regular expression pattern in string . |
Array of substrings |
regexp_extract_all(string, pattern, group) |
Finds all occurrences of the regular expression pattern in string and returns the capturing group number. |
Array of substrings (group) |
Example:
SELECT regexp_extract_all('1a 2b 14m', '\d+');
-- [1, 2, 14]
SELECT regexp_extract_all('1a 2b 14m', '(\d+)([a-z]+)', 2);
-- ['a', 'b', 'm']
regexp_extract()
¶
Function | Description | Return Value |
---|---|---|
regexp_extract(string, pattern) |
Returns the first substring matched by the regular expression pattern in string . |
varchar |
regexp_extract(string, pattern, group) |
Finds the first occurrence of the regular expression pattern in string and returns the capturing group number group . |
varchar |
Example:
SELECT regexp_extract('1a 2b 14m', '\d+');
-- 1
SELECT regexp_extract('1a 2b 14m', '(\d+)([a-z]+)', 2);
-- 'a'
regexp_like()
¶
Function | Description | Return Value |
---|---|---|
regexp_like(string, pattern) |
Evaluates the regular expression pattern and determines if it is contained within string . |
boolean |
Example:
regexp_position()
¶
Function | Description | Return Value |
---|---|---|
regexp_position(string, pattern) |
Returns the index of the first occurrence (counting from 1) of pattern in string . Returns -1 if not found. |
integer |
regexp_position(string, pattern, start) |
Returns the index of the first occurrence of pattern in string , starting from start (inclusive). Returns -1 if not found. |
integer |
regexp_position(string, pattern, start, occurrence) |
Returns the index of the nth occurrence of pattern in string , starting from start (inclusive). Returns -1 if not found. |
integer |
Example:
SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b');
-- 8
SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12);
-- 19
regexp_replace()
¶
Function | Description | Return Value |
---|---|---|
regexp_replace(string, pattern) |
Removes every instance of the substring matched by the regular expression pattern from string . |
varchar |
regexp_replace(string, pattern, replacement) |
Replaces every instance of the substring matched by the regular expression pattern in string with replacement . |
varchar |
regexp_replace(string, pattern, function) |
Replaces every instance of the substring matched by the regular expression pattern in string using function . |
varchar |
Example:
SELECT regexp_replace('1a 2b 14m', '\d+[ab] ');
-- '14m'
SELECT regexp_replace('1a 2b 14m', '(\d+)([ab]) ', '3c$2 ');
-- '3ca 3cb 14m'
SELECT regexp_replace('new york', '(\w)(\w*)', x -> upper(x[1]) || lower(x[2]));
--'New York'