Search code examples
sqlregexdatagripexasol

Regular expression - capture number between underscores within a sequence between commas


I have a field in a database table in the format:

111_2222_33333,222_444_3,aaa_bbb_ccc

This is format is uniform to the entire field. Three underscore separated numeric values, a comma, three more underscore separated numeric values, another comma and then three underscore separated text values. No spaces in between

I want to extract the middle value from the second numeric sequence, in the example above I want to get 444

In a SQL query I inherited, the regex used is ^.,(\d+)_.$ but this doesn't seem to do anything.

I've tried to identify the first comma, first number after and the following underscore ,222_ to use as a starting point and from there get the next number without the _ after it

This (,\d*_)(\d+[^_]) selects ,222_444 and is the closest I've gotten


Solution

  • We can try using REGEXP_REPLACE with a capture group:

    SELECT
        REGEXP_REPLACE( 
            '111_2222_33333,222_444_3,aaa_bbb_ccc',
            '^[^,]+,[^_]+_(.*?)_[^_]+,.*$',
            '\1') AS num
    FROM yourTable;
    

    Here is a demo showing that the above regex' first capture group contains the quantity you want.

    Demo