Search code examples
sqloraclenlpregexp-substr

Extract string data between specific points in Oracle SQL


Example of Text:

PROCEDURE:                          Cryo balloon antral pulmonary vein
isolation and cavotricuspid isthmus ablation.

The patient is a middle aged and happy.

I am trying to extract "Cryo balloon antral pulmonary vein isolation and cavotricuspid isthmus ablation" from the text.

The Code I used: TRIM(REGEXP_SUBSTR(a.Document_Text, 'Procedure:\s*(.*)\s*?\.',1,1,'inm',1)) as Text_Procedure_DESC

But what I get is the whole note after 'Procedure:'

FYI:

  • There are text fields Prior to and after the Procedure entry.
  • There are multiple spaces and vary from note to note.

Not sure where I messed up!


Solution

    • You want to make the .* pattern non-greedy by using .*?
    • The m option is redundant (as you are not matching the start or end of lines)
    • and you don't need the TRIM function.

    Like this:

    SELECT REGEXP_SUBSTR(document_text, 'PROCEDURE:\s+(.*?)\.', 1, 1, 'in' , 1)
             AS Text_Procedure_DESC
    FROM   table_name;
    

    Which, for the sample data:

    CREATE TABLE table_name (value) AS
    SELECT 'PROCEDURE:                          Cryo balloon antral pulmonary vein
    isolation and cavotricuspid isthmus ablation.
    
    The patient is a middle aged and happy.' FROM DUAL
    

    Outputs:

    TEXT_PROCEDURE_DESC
    Cryo balloon antral pulmonary vein
    isolation and cavotricuspid isthmus ablation

    db<>fiddle here