Search code examples
sqlmysql

parse field (possibly using regex) in mysql when selecting


I have a MySQL (MySQL v8) table where one column has a long string of information, but when I select it, I need to retrieve only a specific part of the long string.

I need as part of the response the value that comes immediately before the " - " part, and after the <Br> part (unless the beginning of the string, in which case there is no <Br>)

create table codes (
    id integer primary key
    desc text not null
);

insert into codes values (1, '423.9 - This is the first item<Br>511.9 - This is the second item<Br>486 - This is the third item (486.0 actually)<Br>42.a.7.31 - <Br>785.0 - 785.? unsure if decimal is correct')

insert into codes values (2, '428.0M - CHF NOS')

insert into codes values (3, 'E.423.9 - Just testing!<Br>511.9 - Another test 12345')

I need the part of the description immediately before the ' - ' part, for example after "select ... from codes" I would have this:

423.9, 511.9, 486, 42.a.7.31, 785.0
428.0M
E.423.9, 511.9

Solution

  • Ideally you should do this cleanup operation in some scripting language (e.g. Python, Perl) outside of MySQL, before you import your data into the database. That being said, on MySQL 8+, we could try using REGEXP_REPLACE() for a regex option to extract out the terms wanted:

    SELECT id, `desc`,
           REGEXP_REPLACE(REGEXP_REPLACE(`desc`, '([^ ]+).*?(<Br>|$)', '$1, '), ', $', '') AS output
    FROM codes;
    

    The regex pattern used here matches:

    • ([^ ]+) match AND capture in $1 the first non whitespace term
    • .*? match until reaching the nearest
    • (<Br>|$) <Br> tag OR end of the string

    Then, we replace with just $1 followed by commma and space, to build out the desired CSV string. Note that the outer call to REGEXP_REPLACE is to strip off the final dangling comma.

    Demo