I am trying to save figure info to a .csv file using Image, but I'm stuck at converting it back to a figure. It keeps giving me the error "AttributeError: 'str' object has no attribute 'array_interface'".
I suppose it means that my entry extracted from the .csv file is a string and needs to be convert to an array?
My code that converts the figure to the np array looks like this:
from PIL import Image
img = np.array(Image.open(fig_file))
file_name = 'data.csv'
row_contents = [labels, img]
from csv import writer
def append_list_as_row(file_name, list_of_elem):
# Open file in append mode
with open(file_name, 'a+', newline='') as write_obj:
# Create a writer object from csv module
csv_writer = writer(write_obj)
# Add contents of list as last row in the csv file
csv_writer.writerow(list_of_elem)
append_list_as_row(file_name, row_contents)
And the problematic part (convert it back to a figure) looks like this:
import pandas as pd
df1 = pd.read_csv(file_name)
fig_array = df1.loc[1, "img"]
img = Image.fromarray(fig_array, 'RGB')
img.save('test.png')
The Image line causes the error. Maybe I shouldn't use pandas to locate the entry? Any idea on modification? I tried .to_numpy(), it doesn't work.
Thank you so much!
First of all, if possible, DON'T DO THIS. This is too costly. Just make a table (dataframe) recording the label(s) associated to each file that can be queried later on. E.g.
| file_id | file_path | label |
|---------|-----------|-----------|
| 1 | a.jpg | fine-arts |
| 2 | b.png | manga |
| 3 | c.jpb | whatever |
If you REALLY have to encode images into strings, base64 encoding is a common way to go. For example, jupyter notebook
uses base64
format to embed images into the html file so users can share the resulted images easily.
Second, it is still NOT recommended to save the (label, data) pair as csv file due to limitation of column width for spreadsheet softwares. And if one can't take advantage of .csv
format, why use it? Therefore, in this case, it is still better to make a lookup table mentioned above to avoid the unnecessary costly transformation.
If you are still doing it, OK, here is the sample code. The SMALL image is taken from debian homepage. One can verify the data is correctly restored.
Code:
import numpy as np
from PIL import Image
import base64
import csv
# https://www.debian.org/Pics/openlogo-50.png
img_path = "/mnt/ramdisk/debian-openlogo-50.png"
img = np.array(Image.open(img_path))
img_encoded = base64.b64encode(img).decode("ascii")
label = "fine-arts"
# Simulate multiple records
data = [
[label, img_encoded],
[label, img_encoded],
[label, img_encoded]
]
# save
with open("/mnt/ramdisk/sav.csv", "w+") as f:
w = csv.writer(f)
w.writerows(data)
# load
data_loaded = []
with open("/mnt/ramdisk/sav.csv") as f:
r = csv.reader(f)
for row in r:
data_loaded.append(row)
# check data are unchanged after S/L
for i in range(3):
for j in range(2):
assert data[i][j] == data_loaded[i][j]
# decode the image (still need shape info)
r = base64.b64decode(data_loaded[0][1].encode("ascii"))
img_decoded = np.frombuffer(r, dtype=np.uint8).reshape((61, 50, 4))
# check image is restored correctly
import matplotlib.pyplot as plt
plt.imshow(img_decoded)
plt.show()
However, if a larger image such as Mona Lisa is used, the csv reader will complain:
_csv.Error: field larger than field limit (131072)
And you still need image shape to restore the dimensions. Therefore a third column storing image shape is actually needed.