I have a table with 2 columns.(column 2 is basically an append list of multiple emails)
Caseid | Text |
---|---|
101 | FROM: ABHI RATHI <(><<)>ABHIRATHI@XYZ.COM> SENT: FRIDAY, JUNE 17, 2022 12:33 PM TO: VR, SARATH <(><<)>SARATH.VR@ABC.COM>; SUBJECT: RE: LOG # 20092017656 : [ SECURITY TEAM IDENTIFY THERE IS SUSPICION ACTIVITY ] [ ACTIONS REQUIRED ] FROM: YUAN LN <(><<)>YIQUANLIN@XYZ.COM> SENT: FRIDAY, JUNE 10, 2022 3:40 PM TO: VR, SARATH <(><<)>SARTH.VIR@ABC.COM> SINGH, ROHINI <(><<)>ROSHINI.SINGH@PQR.COM>SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN |
REQUIRED OUTPUT:
CaseId | Text |
---|---|
101 | SUBJECT: RE: LOG # 20092017656 : [ SECURITY TEAM IDENTIFY THERE IS SUSPICION ACTIVITY ] [ ACTIONS REQUIRED ]. SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN |
Remove from 'FROM:' till '>' before 'subject'. Only keep the contents and remove unnecessary text
I tried this :
with test as
(
select 'FROM: ABHI RATHI <(><<)>ABHIRATHI@XYZ.COM> SENT: FRIDAY, JUNE 17, 2022 12:33 PM TO: VR, SARATH <(><<)>SARATH.VR@ABC.COM>; SUBJECT: RE: LOG # 20092017656 : [ SECURITY TEAM IDENTIFY THERE IS SUSPICION ACTIVITY ] [ ACTIONS REQUIRED ] FROM: YUAN LN <(><<)>YIQUANLIN@XYZ.COM> SENT: FRIDAY, JUNE 10, 2022 3:40 PM TO: VR, SARATH <(><<)>SARTH.VIR@ABC.COM> SINGH, ROHINI <(><<)>ROSHINI.SINGH@PQR.COM>SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN' as text from dual
)
select regexp_replace(text, 'FROM:[[:print:]]*[\.][a-zA-Z]{2,4}[\>]','' )
from test
Got output:
SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN
Issue: Contents of first email is missing.
required output:
SUBJECT: RE: LOG # 20092017656 : [ SECURITY TEAM IDENTIFY THERE IS SUSPICION ACTIVITY ] [ ACTIONS REQUIRED ] SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN
You may use non-greedy (indicated by a question mark) match between FROM
and SUBJECT
and replace this match with SUBJECT
:
with sample(caseid, val) as (
select 101, q'^FROM: ABHI RATHI <(><<)>ABHIRATHI@XYZ.COM> SENT: FRIDAY, JUNE 17, 2022 12:33 PM TO: VR, SARATH <(><<)>SARATH.VR@ABC.COM>; SUBJECT: RE: LOG # 20092017656 : [ SECURITY TEAM IDENTIFY THERE IS SUSPICION ACTIVITY ] [ ACTIONS REQUIRED ] FROM: YUAN LN <(><<)>YIQUANLIN@XYZ.COM> SENT: FRIDAY, JUNE 10, 2022 3:40 PM TO: VR, SARATH <(><<)>SARTH.VIR@ABC.COM> SINGH, ROHINI <(><<)>ROSHINI.SINGH@PQR.COM>SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN^'
from dual
)
select
caseid
, regexp_replace(val, 'FROM.+?(SUBJECT)', '\1') as replaced
, val
from sample
CASEID | REPLACED | VAL |
---|---|---|
101 | SUBJECT: RE: LOG # 20092017656 : [ SECURITY TEAM IDENTIFY THERE IS SUSPICION ACTIVITY ] [ ACTIONS REQUIRED ] SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN | FROM: ABHI RATHI <(><<)>ABHIRATHI@XYZ.COM> SENT: FRIDAY, JUNE 17, 2022 12:33 PM TO: VR, SARATH <(><<)>SARATH.VR@ABC.COM>; SUBJECT: RE: LOG # 20092017656 : [ SECURITY TEAM IDENTIFY THERE IS SUSPICION ACTIVITY ] [ ACTIONS REQUIRED ] FROM: YUAN LN <(><<)>YIQUANLIN@XYZ.COM> SENT: FRIDAY, JUNE 10, 2022 3:40 PM TO: VR, SARATH <(><<)>SARTH.VIR@ABC.COM> SINGH, ROHINI <(><<)>ROSHINI.SINGH@PQR.COM>SUBJECT: RE: LOG # 20092017656 : HI SARTH SORRY FOR THE LATE REPLY AS I AM ON LEAVE TODAY CAN WE HAVE A CALL NEXT MONDAY? THANKS AND REGARDS YUAN |
But your sample data has extra punctuation between the last >
and SUBJECT
(semicolon in the first occurrence and nothing at the last occurrence), so you may need an extra match pattern for allowed set of characters.