Search code examples

How to remove everything after a certain keyword from email body using regex?

I wrote the below code to grab specific values. They include date, index value: SGEPSBSH, and bbg level from a particular email.

I'm trying to save it to pandas dataframe. Before saving the entire email body to dataframe, I am trying to remove everything after client signature starting from keyword "Regards".

I get the following error:

File "", line 39, in <module>
    Body_content = message.body
File "", line 473, in __getattr__
    raise AttributeError("'%s' object has no attribute '%s'" % (repr(self), attr))
AttributeError: '<Library._MailItem instance at 0x2473706520480>' 
                object has no attribute 'body'

Can you please help to fix my code?

import win32com.client
import re
import os
import pandas
import datetime
from datetime import date

EMAIL_CONTNT = {'Ticker': [], 'TickerLevel': [], 'DATE': []}
out_app = win32com.client.gencache.EnsureDispatch("Outlook.Application")
out_namespace = out_app.GetNamespace("MAPI")

root_folder = out_namespace.GetDefaultFolder(6)
out_iter_folder = root_folder.Folders['Email_snapper']
char_length_of_search_substring = len(EMAIL_SUBJ_SEARCH_STRING)
item_count = out_iter_folder.Items.Count
Flag = False
cnt = 0
if out_iter_folder.Items.Count > 0:
    for i in range(item_count, 0, -1)[:2]:
        message = out_iter_folder.Items[i]
        #message = message.Restrict("[ReceivedTime] >= '" + lastWeekDateTime + "'")
Body_content = message.body
message.body = re.sub(r".*Regards[^\n]+\n[^\n]+", "",message.body)

enter image description here


  • If you are not set on using regex, a simple string slicing might work for you as well

    s = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus tincidunt elit in ex " \
        "molestie euismod sed et velit. Aenean blandit placerat sodales. Curabitur mattis nibh nec " \
        "leo hendrerit commodo. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras eu " \
        "mattis dui, at convallis dolor."
    s = s[:s.find("amet")].strip()


    Lorem ipsum dolor sit