Search code examples
pythonpandasdataframematplotlibimshow

Plotting a Pandas DataFrame with RGB values and coordinates


I have a pandas DataFrame with the columns ["x", "y", "r", "g", "b"] where x and y denote the coordinates of a pixel and r, g, b denote its RGB value. The rows contain entries for each coordinate of a grid of pixels and are unique. How can I display this DataFrame using matplotlibs's imshow()? This requires reshaping the data into a array of shape (M, N, 3).

My usual approach of using plt.imshow(df.pivot(columns="x", index="y", values="i"), interpolation="nearest") does only work for greyscale images. Placing ["r", "g", "b"] as the values argument yields a DataFrame with a MultiIndex as columns. However I fail to convert this into a correct image. Simply calling .reshape(M, N, 3) creates a wrong image.

I also had the idea of creating a new column with df["rgb"] = list(zip(df.r, df.g, df.b)) However I'm not sure on how to convert the resulting tuples into a new axis for the ndarray.


Solution

  • There exists an easy way to do this. First, you make sure the DataFrame is sorted by x- and y-values using df = df.sort_values(by=['x', 'y']).

    Next, you select only the three columns for r, g and b from the DataFrame by calling df[['r', 'g', 'b']]. You convert the values into a numpy array by calling df[['r', 'g', 'b']].values, which will return an array of the shape (M*N, 3), assuming that M and N are the width and height of your image. Now, reshape that array into the shape (M, N, 3) and you are done.

    df = df.sort_values(by=['x', 'y'])
    values = df[['r', 'g', 'b']].values
    image = values.reshape(df['x'].max() + 1 , df['y'].max() + 1, 3)
    

    I'm assuming here that your x and y values in the DataFrame start at 0, therefore I add 1 for the dimensions. If your x and y values start at 1, the reshaping can be done like this (df['x'].max(), df['y'].max(), 3).

    Depending on what you consider the x and y dimensions of your image, you might have to transpose the array in the end.