Search code examples
sqlregexoracle-databaseextractcurrency

Regexp_replace WITH DECIMAL is returning error ORA-01722 "Invalid Number"


In Oracle I had this code:

regexp_replace(DESCRIPTION_FIELD, '[^0-9.]+', '') 

that worked fine, but now I'm getting the error:

ORA-01722 "Invalid Number".

Removing the decimal fixes this, but removes the decimal from the returning value. I'm guessing there's some bad data added recently. Solutions?


Solution

  • The regexp_replace() isn't generating that error; the problem is when you cast the result of that replacement to a number. For example, for the original value XYZ2626...266.88 your pattern brings back 2626...266.88, and to_number('2626...266.88') throws ORA-01722.

    by finding numbers with a format of "Some numbers, a decimal, then two numbers"

    You could look for that rather than trying to exclude other characters:

    with your_table (description_field) as (
      select 'No money value' from dual
      union all
      select 'Some sensible 98765.43 value' from dual
      union all
      select '01234-1234545 54.00' from dual
      union all
      select 'XYZ2626...266.88' from dual
      union all
      select 'ABC-123.45XYZ' from dual
      union all
      select 'ABC123.45XYZ6.78' from dual
    )
    select description_field,
      regexp_replace(DESCRIPTION_FIELD, '[^0-9.]+', '') as original,
      regexp_replace(DESCRIPTION_FIELD, '.*?((-?\d+\.\d{2})[^0-9]*)?$', '\2') as new
    from your_table;
    
    DESCRIPTION_FIELD            ORIGINAL             NEW                 
    ---------------------------- -------------------- --------------------
    No money value                                                        
    Some sensible 98765.43 value 98765.43             98765.43            
    01234-1234545 54.00          01234123454554.00    54.00               
    XYZ2626...266.88             2626...266.88        266.88              
    ABC-123.45XYZ                123.45               -123.45             
    ABC123.45XYZ6.78             123.456.78           6.78                
    

    I've allowed for negative numbers but you might not want those... and if there is more than one potential money values it'll take the last one.

    The capturing group (-?\d+\.\d{2}) looks for an optional minus sign, followed by any number of digits, followed by a period, followed by exactly 2 digits. But that on its own wouldn't prevent further digits afterwards, so it's followed by [^0-9]* to make sure that doesn't happen. That combination is enclosed in a second grouping to allow it to be optional (followed by ?) - otherwise values without anything that looks like a money amount are passed through unaltered, which will also error presumably.