Search code examples
sqloracle-databaseextractcpu-word

Oracle SQL - Extract some words only from a column?


I am working with Oracle SQL and I have a column with specific words that I want to extract. For example, I have:

MENACE DE MORT REITEREE EN RECIDIVE
VIOLENCES SUR CONJOINT

And I want to extract:

MENACE MORT
VIOLENCES CONJOINT

I've tried using regex_replace to replace specific words found in my list of words, but I have not been able to extract only the specific words. Is there a way to do this? I have about 50 specific words to extract.

Thank you in advance for your help!


Solution

  • In any case you will need a list of words (and probably some order of the wrords) to do the job. One of the options is to create a list of words and store them in a table. In this answer I will create CTE (words) as a sample. But first some sample data to work with:

    WITH            --  S a m p l e   D a t a :
        tbl (ID, TXT) AS
            (   
                Select 1, 'MENACE DE MORT REITEREE EN RECIDIVE' From Dual Union All
                Select 2, 'VIOLENCES SUR CONJOINT' From Dual Union All
                Select 3, 'MALTRAITANCE PSYCHOLOGIQUE SUR CONJOINT' From Dual
            ),
    

    ... and words list ...

        words_str AS
            (     
                Select '\MENACE\VIOLENCES\MALTRAITANCE\PSYCHOLOGIQUE\MORT\CONJOINT\' "WORDS_TO_RETURN" From Dual
            ), 
        words AS
            (
                Select  LEVEL "WORD_ID", SubStr(WORDS_TO_RETURN, InStr(WORDS_TO_RETURN, '\', 1, LEVEL) + 1, InStr(WORDS_TO_RETURN, '\', 1, LEVEL + 1) - InStr(WORDS_TO_RETURN, '\', 1, LEVEL) -1) "WORDS"
                From    words_str
                Connect By LEVEL <= Length(WORDS_TO_RETURN) - Length(Replace(WORDS_TO_RETURN, '\', '')) - 1
            )
    
    WORD_ID WORDS
    1 MENACE
    2 VIOLENCES
    3 MALTRAITANCE
    4 PSYCHOLOGIQUE
    5 MORT
    6 CONJOINT

    You can now join your table with the words and truncate your text to the list of words from above...

    --  M a i n   S Q L :
    SELECT  DISTINCT ID, TXT, 
            LISTAGG(WORDS, ' ') OVER(Partition By ID, TXT) "SHORT_TXT"
    FROM   (   Select  t.ID, t.TXT,  w.WORD_ID, w.WORDS
               From tbl t
               Inner Join  words w ON(  InStr('\' || Replace(t.TXT, ' ', '\') || '\' , '\' || w.WORDS || '\') > 0 )
               Order By t.ID, w.WORD_ID
            )
    ORDER BY ID
    

    Result should be ...

    ID TXT SHORT_TXT
    1 MENACE DE MORT REITEREE EN RECIDIVE MENACE MORT
    2 VIOLENCES SUR CONJOINT VIOLENCES CONJOINT
    3 MALTRAITANCE PSYCHOLOGIQUE SUR CONJOINT MALTRAITANCE PSYCHOLOGIQUE CONJOINT