Search code examples
pythonregexpython-3.xicalendar

Match all lines between two words that contain specific string


I need a help with RegEx. I want to match all lines between BEGIN:VEVENT and END:VEVENT but only if between of these lines is string PARTSTAT=DECLINED. Below I placed example of text where are 3 events (two of them contain PARTSTAT=DECLINED and one of them contains PARTSTAT=ACCEPTED). I would like to remove events from my ical that are declined.

BEGIN:VEVENT
UID:040000008200E00074C5B7101A82E0080000000090E9AB1DA717D4010000000000000000
 10000000FF519C52170B604C82055C2922E0EA43
RRULE:FREQ=WEEKLY;BYDAY=MO
X-ALT-DESC;FMTTYPE=text/html:<html xmlns:v="urn:schemas-microsoft-com:vml" x
 mlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-micros
 oft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/om
 ml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-T
 ype content="text/html\; charset=iso-8859-2"><meta name=Generator content="M
 icrosoft Word 15 (filtered medium)"><style><!--\n/* Font Definitions */\n@fo
 nt-face\n{font-family:"Cambria Math"\;\npanose-1:2 4 5 3 5 4 6 3 2 4\;}\n@fo
 nt-face\n{font-family:Calibri\;\npanose-1:2 15 5 2 2 2 4 3 2 4\;}\n/* Style 
 Definitions */\np.MsoNormal\, li.MsoNormal\, div.MsoNormal\n{margin:0cm\;\nm
 argin-bottom:.0001pt\;\nfont-size:11.0pt\;\nfont-family:"Calibri"\,sans-seri
 f\;\nmso-fareast-language:EN-US\;}\na:link\, span.MsoHyperlink\n{mso-style-p
 riority:99\;\ncolor:#0563C1\;\ntext-decoration:underline\;}\na:visited\, spa
 n.MsoHyperlinkFollowed\n{mso-style-priority:99\;\ncolor:#954F72\;\ntext-deco
 ration:underline\;}\np.msonormal0\, li.msonormal0\, div.msonormal0\n{mso-sty
 le-name:msonormal\;\nmso-margin-top-alt:auto\;\nmargin-right:0cm\;\nmso-marg
 in-bottom-alt:auto\;\nmargin-left:0cm\;\nfont-size:12.0pt\;\nfont-family:"Ti
 mes New Roman"\,serif\;}\nspan.Stylwiadomocie-mail18\n{mso-style-type:person
 al-compose\;\nfont-family:"Calibri"\,sans-serif\;\ncolor:windowtext\;}\n.Mso
 ChpDefault\n{mso-style-type:export-only\;\nfont-size:10.0pt\;}\n@page WordSe
 <o:p></o:p></p></div></body></html>
LOCATION:[email protected]
ATTENDEE;[email protected];PARTSTAT=DECLINED:mailto:[email protected]
ATTENDEE;CN=Name Surname
PRIORITY:5
X-MICROSOFT-CDO-BUSYSTATUS:TENTATIVE
X-MICROSOFT-CDO-IMPORTANCE:1
X-MS-OLK-AUTOSTARTCHECK:FALSE
X-MS-OLK-CONFTYPE:0
SUMMARY:None
DTSTART;TZID="Europe/UK":19980615T110000
DTEND;TZID="Europe/UK":19980615T113000
STATUS:CONFIRMED
CLASS:PUBLIC
X-MICROSOFT-CDO-INTENDEDSTATUS:BUSY
TRANSP:OPAQUE
LAST-MODIFIED:20180709T150603Z
DTSTAMP:20180709T150602Z
SEQUENCE:0
BEGIN:VALARM
ACTION:DISPLAY
TRIGGER;RELATED=START:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
UID:040000008200E00074C5B7101A82E0080000000090D3C2088E0DD4010000000000000000
 1000000079086417F9C0F9478C1916D1A1E58267
X-ALT-DESC;FMTTYPE=text/html:<html xmlns:v="urn:schemas-microsoft-com:vml" x
 mlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-micros
 oft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/om
 ml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-T
 ype content="text/html\; charset=us-ascii"><meta name=Generator content="Mic
 rosoft Word 15 (filtered medium)"><style><!--\n/* Font Definitions */\n@font
 -face\n{font-family:"Cambria Math"\;\npanose-1:2 4 5 3 5 4 6 3 2 4\;}\n@font
 -face\n{font-family:Calibri\;\npanose-1:2 15 5 2 2 2 4 3 2 4\;}\n/* Style De
 finitions */\np.MsoNormal\, li.MsoNormal\, div.MsoNormal\n{margin:0cm\;\nmar
 gin-bottom:.0001pt\;\nfont-size:11.0pt\;\nfont-family:"Calibri"\,sans-serif\
 ;\nmso-fareast-language:EN-US\;}\na:link\, span.MsoHyperlink\n{mso-style-pri
 ority:99\;\ncolor:#0563C1\;\ntext-decoration:underline\;}\na:visited\, span.
 MsoHyperlinkFollowed\n{mso-style-priority:99\;\ncolor:#954F72\;\ntext-decora
 tion:underline\;}\np.msonormal0\, li.msonormal0\, div.msonormal0\n{mso-style
 nk="#954F72"><div class=WordSection1><p class=MsoNormal><o:p>&nbsp\;</o:p></
 p></div></body></html>
LOCATION:[email protected]
ATTENDEE;[email protected];PARTSTAT=ACCEPTED:mailto:[email protected]
PRIORITY:5
X-MICROSOFT-CDO-BUSYSTATUS:TENTATIVE
X-MICROSOFT-CDO-IMPORTANCE:1
X-MS-OLK-AUTOSTARTCHECK:FALSE
X-MS-OLK-CONFTYPE:0
SUMMARY:None
DTSTART;TZID="Europe/UK":20180628T103000
DTEND;TZID="Europe/UK":20180628T140000
STATUS:CONFIRMED
CLASS:PUBLIC
X-MICROSOFT-CDO-INTENDEDSTATUS:BUSY
TRANSP:OPAQUE
LAST-MODIFIED:20180626T184118Z
DTSTAMP:20180626T184118Z
SEQUENCE:0
BEGIN:VALARM
ACTION:DISPLAY
TRIGGER;RELATED=START:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
UID:040000008200E00074C5B7101A82E008000000008030BEAE1C0ED4010000000000000000
 100000008AEEB06CBD136945961F46812BD0D171
X-ALT-DESC;FMTTYPE=text/html:<html xmlns:v="urn:schemas-microsoft-com:vml" x
 mlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-micros
 oft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/om
 ml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-T
 ype content="text/html\; charset=windows-1250"><meta name=Generator content=
 "Microsoft Word 15 (filtered medium)"><style><!--\n/* Font Definitions */\n@
 font-face\n{font-family:"Cambria Math"\;\npanose-1:2 4 5 3 5 4 6 3 2 4\;}\n@
 font-face\n{font-family:Calibri\;\npanose-1:2 15 5 2 2 2 4 3 2 4\;}\n/* Styl
 e Definitions */\np.MsoNormal\, li.MsoNormal\, div.MsoNormal\n{margin:0cm\;\
 nmargin-bottom:.0001pt\;\nfont-size:11.0pt\;\nfont-family:"Calibri"\,sans-se
 rif\;\nmso-fareast-language:EN-US\;}\na:link\, span.MsoHyperlink\n{mso-style
 -priority:99\;\ncolor:#0563C1\;\ntext-decoration:underline\;}\na:visited\, s
 pan.MsoHyperlinkFollowed\n{mso-style-priority:99\;\ncolor:#954F72\;\ntext-de
 <o:p></o:p></p></div></body></html>
LOCATION:Sala 3.11
ATTENDEE;CN=Sala kon 3.11;PARTSTAT=DECLINED:mailto:sala_3.1
 [email protected]
PRIORITY:5
X-MICROSOFT-CDO-BUSYSTATUS:TENTATIVE
X-MICROSOFT-CDO-IMPORTANCE:1
X-MS-OLK-AUTOSTARTCHECK:FALSE
X-MS-OLK-CONFTYPE:0
SUMMARY:None
DTSTART;TZID="Europe/UK":19980615T110000
DTEND;TZID="Europe/UK":19980615T113000
STATUS:CONFIRMED
CLASS:PUBLIC
X-MICROSOFT-CDO-INTENDEDSTATUS:BUSY
TRANSP:OPAQUE
LAST-MODIFIED:20180627T114346Z
DTSTAMP:20180627T114346Z
SEQUENCE:0
BEGIN:VALARM
ACTION:DISPLAY
TRIGGER;RELATED=START:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
UID:040000008200E00074C5B7101A82E008000000008077B51A0819D4010000000000000000
 1000000027DB863B9FBE90468D3B3F888327EF15

Solution

  • Since the goal is to remove entries that contains PARTSTAT=DECLINED, the following will do that by retaining only those with PARTSTAT=ACCEPTED:

    import re
    print([m for m, s in re.findall(r'\b(BEGIN:VEVENT\b.*?\bPARTSTAT=(ACCEPTED|DECLINED)\b.*?\bEND:VEVENT)\b', data, re.DOTALL) if s == 'ACCEPTED'])
    

    For example, given:

    data = '''BEGIN:VEVENT SOME TEXT PARTSTAT=DECLINED END:VEVENT BEGIN:VEVENT SOME TEXT PARTSTAT=ACCEPTED END:VEVENT BEGIN:VEVENT SOME TEXT PARTSTAT=DECLINED END:VEVENT'''
    

    The above code will output:

    ['BEGIN:VEVENT SOME TEXT PARTSTAT=ACCEPTED END:VEVENT']