I have a pandas DataFrame with the columns ["x", "y", "r", "g", "b"]
where x and y denote the coordinates of a pixel and r, g, b denote its RGB value. The rows contain entries for each coordinate of a grid of pixels and are unique. How can I display this DataFrame using matplotlibs's imshow()
? This requires reshaping the data into a array of shape (M, N, 3)
.
My usual approach of using plt.imshow(df.pivot(columns="x", index="y", values="i"), interpolation="nearest")
does only work for greyscale images. Placing ["r", "g", "b"]
as the values argument yields a DataFrame with a MultiIndex as columns. However I fail to convert this into a correct image. Simply calling .reshape(M, N, 3)
creates a wrong image.
I also had the idea of creating a new column with df["rgb"] = list(zip(df.r, df.g, df.b))
However I'm not sure on how to convert the resulting tuples into a new axis for the ndarray.
There exists an easy way to do this. First, you make sure the DataFrame is sorted by x- and y-values using df = df.sort_values(by=['x', 'y'])
.
Next, you select only the three columns for r, g and b from the DataFrame by calling df[['r', 'g', 'b']]
. You convert the values into a numpy array by calling df[['r', 'g', 'b']].values
, which will return an array of the shape (M*N, 3)
, assuming that M
and N
are the width and height of your image.
Now, reshape that array into the shape (M, N, 3)
and you are done.
df = df.sort_values(by=['x', 'y'])
values = df[['r', 'g', 'b']].values
image = values.reshape(df['x'].max() + 1 , df['y'].max() + 1, 3)
I'm assuming here that your x and y values in the DataFrame start at 0, therefore I add 1 for the dimensions. If your x and y values start at 1, the reshaping can be done like this (df['x'].max(), df['y'].max(), 3)
.
Depending on what you consider the x and y dimensions of your image, you might have to transpose the array in the end.