Search code examples
sqlregexoracle-databasesubstr

SQL parse column string for numerical data and store in separate columns


I am using Oracle SQL and want to parse from a single column of strings generally like

suitjacket 899, height195cm weight80kg male
blazerjacket 1099, height170cm weight65kg female
pants 299, height160cm weight89kgs male
coat 1099, height165cms weight60.5kg female

Note that sometimes the units are plural with the extra 's' at the end. How can I extract the height and weight data into their own separate columns using an Oracle-compliant SQL query?

| height | weight |
-------------------
|   195  |   80   |
|   170  |   65   |
|   160  |   89   |
|   165  |   60.5 |

I think I have to use either substring or regex. substring doesn't seem appropriate since the data positions in the string are not fixed. regex seems better, but I'm not sure how to make the regex and then store it in separate columns, one for the height and one for the weight.


Solution

  • Given a sample setup:

    CREATE TABLE tbl
        (field varchar(48));
    
    INSERT INTO tbl (field) VALUES ('suitjacket 899, height195cm weight80kg male');
    INSERT INTO tbl (field) VALUES ('blazerjacket 1099, height170cm weight65kg female');
    INSERT INTO tbl (field) VALUES ('pants 299, height160cm weight89kgs male');
    INSERT INTO tbl (field) VALUES ('coat 1099, height165cms weight60.5kg female');
    

    You could use a combination of the regexp_substr and replace functions, like so:

    select to_number(regexp_substr(field, 'height(\d+\.?\d+)', 1, 1, 'i', 1)) height,
           to_number(regexp_substr(field, 'weight(\d+\.?\d+)', 1, 1, 'i', 1)) weight
      from tbl;
    

    Here is a demo of the above. And here are some references: regexp_substr, replace, regex cheat sheet

    Good luck!