Cloud

regexp_replace

The regexp_replace() function replaces all occurrences of a regular expression pattern in a string with a specified replacement string.

Syntax

The syntax for regexp_replace() function is:

REGEXP_REPLACE(source_string, pattern, replacement, [flags])

Parameters

  • source_string: String on which to perform the replacement.

  • pattern: POSIX regular expression pattern to match.

  • replacement: Replacement string.

  • flags: Optional. Flags that change the matching behavior of regexp_replace().

The flags parameter is an optional string that controls how the function operates. Here is a list of flags supported in Redpanda SQL:

  • g: Global replacement. This flag ensures that all occurrences of the pattern are replaced.

  • i: Use this flag for case-insensitive matching.

  • c: Force case-sensitive matching (the default). Including c has the same effect as omitting all flags.

Examples

Basic usage

This example uses the regexp_replace() function with a basic POSIX regular expression pattern:

SELECT REGEXP_REPLACE('The SQL ENGINE supports various data types', 'T[^ ]*', 'A') AS "Replaced_String";

The query returns:

 Replaced_String
-----------------------------------------
 A SQL ENGINE supports various data types

The pattern T[^ ]* matches any substring starting with T followed by non-space characters. The function replaces the first match (The) with the replacement string A.

Replace special characters

This example demonstrates how to replace a non-alphanumeric character in a string with a tilde (~):

SELECT REGEXP_REPLACE('Hello World!', '[^A-Za-z0-9 ]', '~') AS "Replaced_String";

In this query, the second parameter is a regular expression [^A-Za-z0-9 ] that matches any character that is not an uppercase or lowercase letter, digit, or space. Output:

 Replaced String
-------------------
 Hello World~

Flag usage

Replace certain substrings with a single flag defined

This example uses the regexp_replace() function with a defined flag to replace certain substrings in a string. First, create a sample quotes table:

CREATE TABLE quotes (quotes_text text);
INSERT INTO quotes (quotes_text)
VALUES ('Work hard and stay hungry. Lazy people get nowhere in life.'),
       ('An excuse is a way for a LAZY person to feel better.'),
       ('The word LUCKY is how a lazy person describes someone who works hard.');

SELECT quotes_text FROM quotes;

The query returns:

                              quotes_text
-----------------------------------------------------------------------
 Work hard and stay hungry. Lazy people get nowhere in life.
 An excuse is a way for a LAZY person to feel better.
 The word LUCKY is how a lazy person describes someone who works hard.
(3 rows)

Now, use the regexp_replace() function with the i flag specified to replace all occurrences of the word lazy with active regardless of case sensitivity:

SELECT quotes_text, REGEXP_REPLACE(quotes_text, 'lazy', 'active', 'i') AS "New quotes" FROM quotes;

In this case, all occurrences of the word lazy have been replaced with active:

                              quotes_text                              |                               New quotes
-----------------------------------------------------------------------+-------------------------------------------------------------------------
 Work hard and stay hungry. Lazy people get nowhere in life.           | Work hard and stay hungry. active people get nowhere in life.
 An excuse is a way for a LAZY person to feel better.                  | An excuse is a way for a active person to feel better.
 The word LUCKY is how a lazy person describes someone who works hard. | The word LUCKY is how a active person describes someone who works hard.
(3 rows)

Specify one or more flags

Without specifying the g flag, regexp_replace() function replaces only the first occurrence of a substring:

SELECT REGEXP_REPLACE('ab12c', '[0-9]', 'X');
 regexp_replace
----------------
 abX2c

In this case, only the first digit (1) was replaced with X. By adding the g flag, all occurrences are replaced with X:

SELECT REGEXP_REPLACE('ab12c', '[0-9]', 'X', 'g');
 regexp_replace
----------------
 abXXc

If you specify multiple flags, the last one takes precedence. With ci the match is case-insensitive; with ic it is case-sensitive.

Restrictions

  • The function returns NULL if there are no input rows or NULL values.

  • If the regular expression pattern isn’t found in the string, the regexp_replace() function returns the original string.