I have a string that (I think) has BOM inside of it and I would like to remove all the BOM without messing with the format.
For example my string looks like this:
>=20
> =EF=BB=BF
>=20
> -Jeff
>=20
> Begin forwarded message:
>=20
And I would like it to look like:
>
>
>
> -Jeff
>
> Begin forwarded message:
>
I am fine with the >
being left to indicate indention I just want the stray characters removed. If I decode the message then I get a string that is uglier and hard to read than what I already have. It has a bunch of \r\n\r\n
in it from the line breaks so ideally id like to just remove the things mentioned leaving the format alone.
Edit 1: Here is how I am getting to this point:
def getEmails():
LOG.debug("Starting to get emails")
conn = connectToMailServers()
conn.select('inbox', readonly=True )
result, data = conn.search(None, '(UNSEEN)')
mail_ids = data[0]
id_list = mail_ids.split()
for _, i in enumerate(id_list):
result, data = conn.fetch(str(int(i)), '(RFC822)' )
for response_part in data:
if isinstance(response_part, tuple):
msg = email.message_from_bytes(response_part[1])
getPlainText(msg)
def getPlainText(msg):
for part in msg.walk():
if part.get_content_type() == 'text/plain':
LOG.debug(part.get_payload())
return str(part.get_payload())
If I turn on decoding (part.get_payload(decode=True)
) then I get into an issue of the string now having a bunch of \r\n\r\n
so how can I do this without decode OR how can I reformat this into a formatted string removing the line breaks
Explicitly telling str converter to use UTF-8 worked,
str(getPlainText(msg), "utf-8")
Gave me the expected results I was looking for.