Search code examples
pythoncsvpysparkexport-to-csvmime

PySpark - Send Email with CSV attached, entire CSV showing up on one line


I have a script that generates a DataFrame. I convert the DF to a CSV, then send it as an email attachment. The problem is the header + data is all in the first row, so the resulting CSV has 60k columns and 1 row. What is wrong?

Here is my code:

df.toPandas().to_csv("/dbfs/<path>/df.csv", mode='w+', encoding='utf-8')
server = smtplib.SMTP('smtp.gmail.com:587')
server.ehlo()
server.starttls()
server.login("<email>", "<password>")

sender = "<email>"
recipient = "<email>"
msg = MIMEMultipart()
msg['Subject'] = 'I need help'
msg['From'] = sender
msg['To'] = recipient

filedata = sc.textFile("/dbfs/<path>/df.csv", use_unicode=False)
msg.attach(MIMEText('This is your test message with attachment...'))
part = MIMEApplication("".join(filedata.collect()), Name="df.csv")
part['Content-Disposition'] = 'attachment; filename="%s"' % 'df.csv'
msg.attach(part)
server.sendmail(sender, [recipient], msg.as_string())
server.close()

Solution

  • Just replace

    "".join(filedata.collect()
    

    with

    "\n".join(filedata.collect())
    

    or

    sc.wholeTextFiles("/dbfs/<path>/df.csv").values().first()
    

    or even better writing - reading routine completly:

    MIMEApplication(df.toPandas().to_csv(), Name="df.csv")