Search code examples
pythonpandasimagecsvfigure

Ways to convert an entry from a .csv file back to a figure


I am trying to save figure info to a .csv file using Image, but I'm stuck at converting it back to a figure. It keeps giving me the error "AttributeError: 'str' object has no attribute 'array_interface'".

I suppose it means that my entry extracted from the .csv file is a string and needs to be convert to an array?

My code that converts the figure to the np array looks like this:

from PIL import Image
    img = np.array(Image.open(fig_file))

file_name = 'data.csv'
row_contents = [labels, img]

from csv import writer
def append_list_as_row(file_name, list_of_elem):
    # Open file in append mode
    with open(file_name, 'a+', newline='') as write_obj:
        # Create a writer object from csv module
        csv_writer = writer(write_obj)
        # Add contents of list as last row in the csv file
        csv_writer.writerow(list_of_elem)

append_list_as_row(file_name, row_contents)

And the problematic part (convert it back to a figure) looks like this:

import pandas as pd
df1 = pd.read_csv(file_name)
fig_array = df1.loc[1, "img"]
img = Image.fromarray(fig_array, 'RGB')
img.save('test.png')

The Image line causes the error. Maybe I shouldn't use pandas to locate the entry? Any idea on modification? I tried .to_numpy(), it doesn't work.

Thank you so much!


Solution

  • First of all, if possible, DON'T DO THIS. This is too costly. Just make a table (dataframe) recording the label(s) associated to each file that can be queried later on. E.g.

    | file_id | file_path | label     |
    |---------|-----------|-----------|
    | 1       | a.jpg     | fine-arts |
    | 2       | b.png     | manga     |
    | 3       | c.jpb     | whatever  |
    

    If you REALLY have to encode images into strings, base64 encoding is a common way to go. For example, jupyter notebook uses base64 format to embed images into the html file so users can share the resulted images easily.

    Second, it is still NOT recommended to save the (label, data) pair as csv file due to limitation of column width for spreadsheet softwares. And if one can't take advantage of .csv format, why use it? Therefore, in this case, it is still better to make a lookup table mentioned above to avoid the unnecessary costly transformation.

    If you are still doing it, OK, here is the sample code. The SMALL image is taken from debian homepage. One can verify the data is correctly restored.

    Code:

    import numpy as np
    from PIL import Image
    import base64
    import csv
    
    # https://www.debian.org/Pics/openlogo-50.png
    img_path = "/mnt/ramdisk/debian-openlogo-50.png"
    
    img = np.array(Image.open(img_path))
    img_encoded = base64.b64encode(img).decode("ascii")
    label = "fine-arts"
    
    # Simulate multiple records
    data = [
        [label, img_encoded],
        [label, img_encoded],
        [label, img_encoded]
    ]
    
    # save
    with open("/mnt/ramdisk/sav.csv", "w+") as f:
        w = csv.writer(f)
        w.writerows(data)
    
    # load
    data_loaded = []
    with open("/mnt/ramdisk/sav.csv") as f:
        r = csv.reader(f)
        for row in r:
            data_loaded.append(row)
    
    # check data are unchanged after S/L
    for i in range(3):
        for j in range(2):
            assert data[i][j] == data_loaded[i][j]
    
    # decode the image (still need shape info)
    r = base64.b64decode(data_loaded[0][1].encode("ascii"))
    img_decoded = np.frombuffer(r, dtype=np.uint8).reshape((61, 50, 4))
    
    # check image is restored correctly
    import matplotlib.pyplot as plt
    plt.imshow(img_decoded)
    plt.show()
    

    However, if a larger image such as Mona Lisa is used, the csv reader will complain:

    _csv.Error: field larger than field limit (131072)
    

    And you still need image shape to restore the dimensions. Therefore a third column storing image shape is actually needed.