Search code examples
pythonexceliobase64plotly-dash

Plotly Dash: How to reproduce 'content' Output of dcc.Upload? (i.e. base64 encoded string)


I am not able to reproduce the exact output of the content-property of the dcc.Upload component.

If I upload the file my_excel.xlsx to the dcc.Upload component, my callback-function receives a "base64 encoded string" (according to the dcc.Upload documentation). I don’t know how to reproduce the exact same string without the dcc.Upload component (I want to use the Output for Unit Tests).

my current approach:

import base64
import io
import pandas as pd

# This is what I try to reproduce the output of the dcc.Upload Component
with open('tests/data/my_excel.xlsx', 'rb') as file:
    raw_data = file.read()
    
# raw_data is the output I receive from the dcc.Upload Component

# these steps are raise no Error with the output of dcc.Upload
_, content_string = raw_data.split(',') # this Fails
decoded = base64.b64decode(content_string)
df = pd.read_excel(io.BytesIO(decoded))

I get the error TypeError: a bytes-like object is required, not 'str'.

if I add

raw_data = base64.b64encode(raw_data)

before the raw_data.split(','), I get the same error.

How do I get the exact same "base64 encoded string" without the dcc.Upload Component?


Solution

  • I could not find a single function to reproduce the contents property of dcc.Upload, but was able to manually create the output of dcc.Upload.

    From the documentation we have:

    contents is a base64 encoded string that contains the files contents [...] Property accept (string; optional): Allow specific types of files. See https://github.com/okonet/attr-accept for more information. Keep in mind that mime type determination is not reliable across platforms. CSV files, for example, are reported as text/plain under macOS but as application/vnd.ms-excel under Windows. In some cases there might not be a mime type set at all.

    Inspecting the contents-string reaveals, that it is composed of two strings:

    content_type, content_string = contents.split(',')
    

    Inspecting further shows:
    content_type: contains mime type information of file
    content_string: the base64 encoded content of the file

    import base64
    import io
    import pandas as pd
    import magic
    
    filepath = 'tests/data/my_excel.xlsx'
    
    # Reproduce output of dcc.Upload Component
    with open(filepath, "rb") as file:
        decoded = file.read()
    content_bytes = base64.b64encode(decoded)
    content_string = content_bytes.decode("utf-8")
    
    mime = magic.Magic(mime=True)
    mime_type = mime.from_file(filepath)
    content_type = "".join(["data:", mime_type, ";base64"])
    
    contents = "".join([content_type, ",", content_string])
    
    # and now revert: convert contents to binary file stream
    content_type, content_string = contents.split(",")
    decoded = base64.b64decode(content_string)
    df = pd.read_excel(io.BytesIO(decoded))