Search code examples
sqlregexoracle-databaseregexp-replace

Remove text from String in Oracle


I have a table with 2 columns.(column 2 is basically an append list of multiple emails)

Caseid Text
101 FROM: ABHI RATHI <(><<)>ABHIRATHI@XYZ.COM> SENT: FRIDAY, JUNE 17, 2022 12:33 PM TO: VR, SARATH <(><<)>SARATH.VR@ABC.COM>; SUBJECT: RE: LOG # 20092017656 : [ SECURITY TEAM IDENTIFY THERE IS SUSPICION ACTIVITY ] [ ACTIONS REQUIRED ] FROM: YUAN LN <(><<)>YIQUANLIN@XYZ.COM> SENT: FRIDAY, JUNE 10, 2022 3:40 PM TO: VR, SARATH <(><<)>SARTH.VIR@ABC.COM> SINGH, ROHINI <(><<)>ROSHINI.SINGH@PQR.COM>SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN

REQUIRED OUTPUT:

CaseId Text
101 SUBJECT: RE: LOG # 20092017656 : [ SECURITY TEAM IDENTIFY THERE IS SUSPICION ACTIVITY ] [ ACTIONS REQUIRED ]. SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN

Remove from 'FROM:' till '>' before 'subject'. Only keep the contents and remove unnecessary text

I tried this :

with test as
(
 select 'FROM: ABHI RATHI <(><<)>ABHIRATHI@XYZ.COM> SENT: FRIDAY, JUNE 17, 2022 12:33 PM TO: VR, SARATH <(><<)>SARATH.VR@ABC.COM>; SUBJECT: RE: LOG # 20092017656 : [ SECURITY TEAM IDENTIFY THERE IS SUSPICION ACTIVITY ] [ ACTIONS REQUIRED ] FROM: YUAN LN <(><<)>YIQUANLIN@XYZ.COM> SENT: FRIDAY, JUNE 10, 2022 3:40 PM TO: VR, SARATH <(><<)>SARTH.VIR@ABC.COM> SINGH, ROHINI <(><<)>ROSHINI.SINGH@PQR.COM>SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN' as text from dual
 )
select regexp_replace(text, 'FROM:[[:print:]]*[\.][a-zA-Z]{2,4}[\>]','' )
from test

Got output:

SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN

Issue: Contents of first email is missing.

required output:

SUBJECT: RE: LOG # 20092017656 : [ SECURITY TEAM IDENTIFY THERE IS SUSPICION ACTIVITY ] [ ACTIONS REQUIRED ] SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN


Solution

  • You may use non-greedy (indicated by a question mark) match between FROM and SUBJECT and replace this match with SUBJECT:

    with sample(caseid, val) as (
      select 101, q'^FROM: ABHI RATHI <(><<)>ABHIRATHI@XYZ.COM> SENT: FRIDAY, JUNE 17, 2022 12:33 PM TO: VR, SARATH <(><<)>SARATH.VR@ABC.COM>; SUBJECT: RE: LOG # 20092017656 : [ SECURITY TEAM IDENTIFY THERE IS SUSPICION ACTIVITY ] [ ACTIONS REQUIRED ] FROM: YUAN LN <(><<)>YIQUANLIN@XYZ.COM> SENT: FRIDAY, JUNE 10, 2022 3:40 PM TO: VR, SARATH <(><<)>SARTH.VIR@ABC.COM> SINGH, ROHINI <(><<)>ROSHINI.SINGH@PQR.COM>SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN^'
      from dual
    )
    select
      caseid
      , regexp_replace(val, 'FROM.+?(SUBJECT)', '\1') as replaced
      , val
    from sample
    
    CASEID REPLACED VAL
    101 SUBJECT: RE: LOG # 20092017656 : [ SECURITY TEAM IDENTIFY THERE IS SUSPICION ACTIVITY ] [ ACTIONS REQUIRED ] SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN FROM: ABHI RATHI <(><<)>ABHIRATHI@XYZ.COM> SENT: FRIDAY, JUNE 17, 2022 12:33 PM TO: VR, SARATH <(><<)>SARATH.VR@ABC.COM>; SUBJECT: RE: LOG # 20092017656 : [ SECURITY TEAM IDENTIFY THERE IS SUSPICION ACTIVITY ] [ ACTIONS REQUIRED ] FROM: YUAN LN <(><<)>YIQUANLIN@XYZ.COM> SENT: FRIDAY, JUNE 10, 2022 3:40 PM TO: VR, SARATH <(><<)>SARTH.VIR@ABC.COM> SINGH, ROHINI <(><<)>ROSHINI.SINGH@PQR.COM>SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN

    fiddle

    But your sample data has extra punctuation between the last > and SUBJECT (semicolon in the first occurrence and nothing at the last occurrence), so you may need an extra match pattern for allowed set of characters.