Search code examples
pythonemailunicodecharacter-encodingrfc2231

Email an attachment with non-ascii filename with python email


How can I send an email with a file attached where the file name contains unicode characters?

Up to now, the file will arrive but with the filename "noname".

This is the part that works perfectly well for ASCII filenames:

import smtplib
from email.mime.text import MIMEText
from email.MIMEBase import MIMEBase
from email.MIMEMultipart import MIMEMultipart
from email.mime.application import MIMEApplication
from email.Utils import formatdate
from email import Encoders
from email.Utils import encode_rfc2231

msg = MIMEMultipart()
msg['Subject'] = "New magazine delivery!"
msg['From'] = sender_email
msg['To'] = ', '.join(kindle_emails)
msg['Date'] = formatdate(localtime=True)
message = "see attachment"
msg.attach(MIMEText(message))
part = MIMEApplication(open(f, 'rb').read(), _subtype='application/x-mobipocket-ebook')

part.add_header('Content-Disposition', 'attachment', filename=os.path.basename(filename)
msg.attach(part)

First try

Adding a tuple of encoding, language and encoded string and not only the filename.

part.add_header('Content-Disposition', 'attachment', filename=('utf-8', 'fr', os.path.basename(f).encode('utf-8')))

Second try:

Setting the charset globally like this:

from email import Charset
Charset.add_charset('utf-8', Charset.QP, Charset.QP, 'utf-8')

Third try

Using utils.encode_rfc2231

from email.Utils import encode_rfc2231
utf8filename = encode_rfc2231(os.path.basename(f).encode('utf-8'), charset='utf-8')
part.add_header('Content-Disposition', 'attachment', filename=('utf-8', 'fr', utf8filename))

Fourth try

Using urllib.quote() to urlencode the filename. This has the same effect on the filename as the third method.

utf8filename = urllib.quote(os.path.basename(f).encode('utf-8'))
part.add_header('Content-Disposition', 'attachment', filename=('utf-8', 'fr', utf8filename))

Any ideas?

Am I missing something essential about RFC2231 filename character encoding?

I use Gmail's SMTP server and python 2.7.


Solution

  • Instead of telling the server that it's UTF-8 like this:

    filename=('utf-8', 'fr', os.path.basename(f).encode('utf-8'))
    

    ...it works when I just send UTF-8 without telling so:

    filename=os.path.basename(f).encode('utf-8')
    

    The file name will be properly displayed.

    This seems to contradict the documentation which states:

    If the value contains non-ASCII characters, it must be specified as a three tuple in the format (CHARSET, LANGUAGE, VALUE), where CHARSET is a string naming the charset to be used to encode the value, LANGUAGE can usually be set to None or the empty string (see RFC 2231 for other possibilities), and VALUE is the string value containing non-ASCII code points.

    This doesn't work, however the python 3 documentation adds: .

    If a three tuple is not passed and the value contains non-ASCII characters, it is automatically encoded in RFC 2231 format using a CHARSET of utf-8 and a LANGUAGE of None.

    Only this works, even for python 2.7, though it's not mentioned in the docs.