Search code examples
pythonpandasmandrill

Pandas python, what is the workbook encoding type?


I am new to python and also to the pandas library in Python. The documentation is not well described and they didn't explain it well. I want to save the dataframe as an excel format and in the memory and I have found the following explanation: [Pandas excel to the memory]

I need an explanation about workbook. The value of this variable is encoded, how can I see the real value of this variable? How to decode it? what should be return value for it?

EDITED:

How to pass it into the content of attachment in Mandrill api. https://mandrillapp.com/api/docs/messages.python.html

This is my attachment part for excel extension :

'attachments': [
            {
                'content': content,
                'name': 'fraud_report.xlsx',
                'type': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
            }

I am not able to open the excel file and all the time i'am getteing an error from Microsoft excel which said the file format is not valid!... Any help can be helpful. Thank you


Solution

  • For the explanation i paste the example from your link here again:

    # Safe import for either Python 2.x or 3.x
    try:
        from io import BytesIO
    except ImportError:
        from cStringIO import StringIO as BytesIO
    
    bio = BytesIO()
    
    # By setting the 'engine' in the ExcelWriter constructor.
    writer = ExcelWriter(bio, engine='xlsxwriter')
    df.to_excel(writer, sheet_name='Sheet1')
    
    # Save the workbook
    writer.save()
    
    # Seek to the beginning and read to copy the workbook to a variable in memory
    bio.seek(0)
    workbook = bio.read()
    

    The method writer.save() saves the data in BytesIO (bio) instead of Excel file. It means, that the variable bio stores the bytes codes of the excel file.

    The method bio.seek(0) sets the current position (for reading, writing,...) of bio to 0. So that you can read the data of bio from beginning with the next method bio.read().

    The variable workbook store the bytes string of the excel file (or excel workbook). If you read a excel file in byte mode, you will get the same data. Or you can write it in a excel file:

    with open("my_excel_file.xlsx", "wb") as f:
       f.write(workbook)
    

    To read data from bio and store in a DataFrame you don't need bio.read():

    bio.seek(0)
    df = pd.read_excel(bio, "Sheet1", engine="xlrd")
    

    For your question about using of mandrill:

    In the example of mandrill you see:

    {'attachments': [{'content': 'ZXhhbXBsZSBmaWxl',
                          'name': 'myfile.txt',
                          'type': 'text/plain'}],...
    

    The documentation writes about it too:

    content: the content of the attachment as a base64-encoded string

    You should encode workbook in base64 and use it for sending

    import base64
    content = base64.b64encode(workbook)
    

    P/S: workbook and content have the type bytes. May be you need to convert content to str before sending.

    {'attachments': [{'content': content.decode('utf-8'),
                              'name': 'myfile.xlsx',
                              'type': 'text/plain'}],...
    

    Add: if file is excel then you should change type to application/vnd.openxmlformats-officedocument.spreadsheetml.sheet