Search code examples
oracle-databasesubstring

Oracle substring function of variable length


I have a column containing filenames where I am trying to isolate the vaccine portion of the text. I need to cut off the beginning of the string up to the underscore. Then I need to cut of the file extension including the period, which could be four or five characters. I have the first part working, but it is not doing the second part. What am I missing? Should I use regexp instead of my 2nd instr? If so, how?

Here are the original values.

4212406_Meningitis.jpg
4824729_Hep-B.jpg
3612290_Hep-B.jpg
2811504_Covid-19.jpeg
621980_Covid-19.pdf
5258652_MMR.jpeg
5755663_Meningitis.png
2555841_Covid-19.PNG
2677160_MMR.jpg
2294961_MMR.jpg
SELECT original_field, 
       SUBSTR(original_field, INSTR(original_field, '_') + 1, INSTR(original_field, '.') -1) AS current_field
FROM my_table 

enter image description here


Solution

  • SUBSTR has the signature SUBSTR(value, start_position, substring_length) (rather than having the third argument as end position). You can use:

    SELECT original_field, 
           SUBSTR(original_field, start_pos, end_pos - start_pos) AS current_field
    FROM   (
      SELECT original_field,
             INSTR(original_field, '_') + 1 AS start_pos,
             INSTR(original_field, '.', -1) AS end_pos
      FROM   my_table
    );
    

    or:

    SELECT original_field, 
           SUBSTR(
             original_field,
             INSTR(original_field, '_') + 1,
             INSTR(original_field, '.', -1) - INSTR(original_field, '_') - 1
           ) AS current_field
    FROM   my_table;
    

    or, shorter to type but slower to execute:

    SELECT original_field, 
           REGEXP_SUBSTR(original_field, '_(.*)\.', 1, 1, NULL, 1) AS current_field
    FROM   my_table;
    

    Which, for the sample data:

    CREATE TABLE my_table ( original_field ) AS
    SELECT '4212406_Meningitis.jpg' FROM DUAL UNION ALL
    SELECT '12345_Small_Pox.version_2.jpg' FROM DUAL;
    

    All output:

    ORIGINAL_FIELD CURRENT_FIELD
    4212406_Meningitis.jpg Meningitis
    12345_Small_Pox.version_2.jpg Small_Pox.version_2

    Or, if you want the first . character instead of the last then:

    SELECT original_field, 
           SUBSTR(original_field, start_pos, end_pos - start_pos) AS current_field
    FROM   (
      SELECT original_field,
             INSTR(original_field, '_') + 1 AS start_pos,
             INSTR(original_field, '.') AS end_pos
      FROM   my_table
    );
    

    or:

    SELECT original_field, 
           SUBSTR(
             original_field,
             INSTR(original_field, '_') + 1,
             INSTR(original_field, '.') - INSTR(original_field, '_') - 1
           ) AS current_field
    FROM   my_table;
    

    or:

    SELECT original_field, 
           REGEXP_SUBSTR(original_field, '_(.*?)\.', 1, 1, NULL, 1) AS current_field
    FROM   my_table;
    

    Which all 3 output:

    ORIGINAL_FIELD CURRENT_FIELD
    4212406_Meningitis.jpg Meningitis
    12345_Small_Pox.version_2.jpg Small_Pox

    fiddle