Search code examples
pythonhtmldatabricksamazon-sesspark-koalas

DataFrame Styler Object HTML Does Not Render Correctly in Amazon SES Email


I am working in a DataBricks Python notebook. I can currently successfully render a DataFrame Styler object inside the notebook that looks like the following:

enter image description here

I now want to email the above as a report inside the body of an email. I followed this guide: https://docs.databricks.com/_static/notebooks/kb/notebooks/send-email-aws.html and the email successfully sends but for some reason it misses some of the background colours as shown here:

enter image description here

I experimented with changing some of the colours in case it was related to the blue and grey colours that I chose but it made no difference, it always looks like the 2nd image shown above. Strangely, if I choose to open the email in the browser, the grey 'No Data' cells render but the blue index and column headers don't as shown here:

enter image description here

Regardless, I need all of the background colours to successfully render inside of the email body as per the first image.

I am unsure if the issue relates to Amazon SES or the HTML I used or something else entirely?

The following shows the preview of the email generated by the function previewHTMLEmail(html) which looks correct, so I suspect the problem is with Amazon SES? Please note that I must use Amazon SES to send the email.

enter image description here

If anyone has any suggestions of how I can successfully achieve what is shown in image 4 then it would be greatly appreciated.

Thanks


This is a generic version of my code to allow you to reproduce what I am seeing.

# IMPORT LIBRARIES.
import databricks.koalas as ks
import numpy as np
import re
import boto3
from email import encoders
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.application import MIMEApplication
# CREATE DUMMY DATA.
df1 = ks.DataFrame({'date':ks.date_range('25/05/2022', '21/06/2022').to_numpy()}).sort_values(by='date',ascending=False)
df1['table_1'] = '0/96'
df1['table_2'] = '0/104'
df1['table_3'] = ['0/101','0/101','4/101','0/101','0/101','0/101','4/101','0/101','0/101','14/101','0/101','0/101','26/101','15/101','0/101','22/101','0/101','4/101','0/101','100/101','14/101','0/101','0/101','21/101','5/101','0/101','22/101','0/101']
df1['table_4'] = ['-1/-1','-1/-1','XXXX','-1/-1','XXXX','-1/-1','14/99','-1','-1/-1','-1/-1','XXXX','-1/-1','XXXX','-1/-1','22/99','-1','-1/-1','-1/-1','XXXX','-1/-1','XXXX','-1/-1','4/99','-1','-1/-1','-1/-1','XXXX','-1/-1']
# COLOUR CELLS BASED ON CONTAINED STRING VALUES.
def color_missing_items(value):

  if re.match(r'^0',value):
    color = 'green'
  elif re.match(r'^[1-9]\d*',value):
    color = 'red'
  elif value == 'No Data':
    color = 'grey'
  else:
    # USE A BRIGHT DEFAULT COLOUR TO CATCH DATA ISSUES OUT WITH THE ABOVE 3 CONDITIONS.
    color = 'purple'
  
  return 'background-color: %s' % color
# SET ALL TABLE PROPERTIES TO FORMAT AND STYLE OUTPUTTED HTML REPORT.
title_properties = {
    'selector': 'caption',
    'props': [('color', 'black'),
              ('font-size', '18px'),
              ('font-weight', 'bold'),
              ('font-family', 'Helvetica')]
}
cell_hover = {  # FOR ROW HOVER USE <tr> INSTEAD OF <td>.
    'selector': 'td:hover',
    'props': [('background-color', '#ffffb3'),
              ('color', 'black'),
              ('font-family', 'Helvetica')]
}
index_names = {
    'selector': '.index_name',
    'props': [('color', 'white'),
              ('background-color', '#000066'),
              ('font-weight', 'bold'),
              ('font-weight','normal'),
              ('font-family', 'Helvetica')]
}
headers = {
    'selector': 'th:not(.index_name)',
    'props': [('background-color', '#000066'), 
              ('color', 'white'),
              ('font-family', 'Helvetica')]
}
heading_centre = {
    'selector': 'th.col_heading', 
    'props': [('text-align', 'center')]
}
text_centre = {
    'selector': 'td', 
    'props': [('text-align', 'center'), 
              ('font-weight', 'bold')]
}
def create_report(df):

       # ALL ATTS MUST BE A SINGLE DTYPE (USED STRING) OR TRANSPOSE STEP FAILS IN DATABRICKS.
       df = df.astype(np.str)

       for table_name in list(df.columns[1:]):
              # IF VALUE STRING CONTAINS NULL REPRESENTATION (-1) THEN REPLACE VALUE WITH 'No Data'. 
              df.loc[df[f'{table_name}'].str.contains('-1'),f'{table_name}'] = 'No Data' 
              # IF VALUE STRING DOES NOT START WITH 0, A NUMBER BETWEEN 1-9 OR 'No Data' THEN REPLACE VALUE WITH 'Error'. THIS IS THE CATCH ALL FOR ANY OTHER ERROR.
              df.loc[~(df[f'{table_name}'].str.contains(r'^0'))&\
                     ~(df[f'{table_name}'].str.contains(r'^[1-9]\d*'))&\
                     ~(df[f'{table_name}'].str.contains('No Data')),f'{table_name}'] = 'Error'
        
       # TRANSPOSE TO PUT DATA INTO REPORT FORMAT.
       df2 = df.T
       # USE THE date ROW AS THE COLUMN VALUES.
       df2.columns = df2.iloc[0]
       # DROP OLD DATE ROW AS NOW HEADER.
       df3 = df2.loc[df.columns[1:]].copy()

       # RETURN REPORT.
       return df3.style.applymap(color_missing_items)\
                       .set_table_styles([title_properties,cell_hover,index_names,headers,heading_centre,text_centre])\
                       .set_properties(**{'font-family': 'Helvetica'})
# MISSING DATA REPORT IN HTML FORMAT - INSERT THIS INTO EMAIL BODY.
report_body_html = report.render()
report_body_html
displayHTML(report_body_html)
# SEND EMAIL WITH AMAZON SES.
def send_email(from_email, to_emails, subject, body_html, attachments=[], cc=[], bcc=[]):

  msg = MIMEMultipart()
  msg['Subject'] = subject
  msg['From'] = from_email
  msg['To'] = ", ".join(to_emails)
  body = MIMEText(report_body_html, 'html')#CHECK IF THIS IS REQUIRED.
    
  msg.attach(body)
  
  #for raw_attachment in attachments:
  #  attachment = MIMEApplication(open(raw_attachment, 'rb').read())
  #  attachment.add_header('Content-Disposition', 'attachment', filename=raw_attachment)
  #  msg.attach(attachment)
  
  ses = boto3.client('ses', region_name='us-east-1')
  ses.send_raw_email(
    Source=msg['FROM'],
    Destinations=to_emails,
    RawMessage={'Data': msg.as_string()})    
  print("Sending Email.")
# CREATE EMAIL HTML.
html = [
  "<center><h1>Energy: Missing Data</h1></center>",
  """
  <p><b>The Missing Data dashboard has been updated.</b></p>
  """
]
def previewHTMLEmail(html):
  displayHTML("\n".join([x for x in html if type(x) != type(type)]))
# ADD REPORT HTML TO THE EMAIL BODY:
html.append(report_body_html)
previewHTMLEmail(html)
# EMAIL METADATA.
from_addr = "***@hotmail.com"
to_addrs = [
  "xyz@outlook.com"
]
subject = "Energy: Missing Data Report"
# SEND THE EMAIL.
send_email(from_email=from_addr, to_emails=to_addrs, subject=subject, body_html=html)

Solution

  • In case this helps anyone in the future experiencing similar issues, I worked it out.

    Firstly, for some reason the email would not render the grey background colour used for the 'No Data' cells unless I used the hex code for grey, so my updated function color_missing_items(value) now looks like this:

    # COLOUR CELLS BASED ON CONTAINED STRING VALUES.
    def color_missing_items(value):
    
      if re.match(r'^0',value):
        color = 'green'
      elif re.match(r'^[1-9]\d*',value):
        color = 'red'
      elif value == 'No Data':
        color = '#808080'
      else:
        # USE A BRIGHT DEFAULT COLOUR TO CATCH DATA ISSUES OUT WITH THE ABOVE 3 CONDITIONS.
        color = 'purple'
      
      return 'background-color: %s' % color
    

    The blue background colour used for the row and column headers not rendering was due to an issue with my table properties - I have updated these to the following:

    # SET ALL TABLE PROPERTIES TO FORMAT AND STYLE OUTPUTTED HTML REPORT.
    title_properties = {
        'selector': 'caption',
        'props': [('color', 'black'),
                  ('font-size', '18px'),
                  ('font-weight', 'bold'),
                  ('font-family', 'Helvetica')]
    }
    cell_hover = {  # FOR ROW HOVER USE <tr> INSTEAD OF <td>.
        'selector': 'td:hover',
        'props': [('background-color', '#ffffb3'),
                  ('color', 'black'),
                  ('font-family', 'Helvetica')]
    }
    headers = {
        'selector': 'th',
        'props': [('background-color', '#000066'), 
                  ('color', 'white'),
                  ('font-family', 'Helvetica'),
                  ('border-color', 'black'),
                  ('border-style', 'solid'),
                  ('border-width', '1px')]
    }
    heading_centre = {
        'selector': 'th.col_heading', 
        'props': [('text-align', 'center')]
    }
    text_centre = {
        'selector': 'td', 
        'props': [('text-align', 'center'), 
                  ('font-weight', 'bold'),
                  ('border-color', 'black'),
                  ('border-style', 'solid'),
                  ('border-width', '1px')]
    }
    

    Now everything works as desired and the sent email looks like the following:

    enter image description here

    I am still unsure why what I originally had worked in the DataBricks notebook but not in the Amazon SES email.