Search code examples
pythonpandasdataframematplotlibscatter-plot

ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers


homicide_scatter_df.plot.scatter(x='Homicide',y='Area Name',s = 225,
                                c = 'Population/mil', colormap='viridis', 
                                 sharex=False)

The code above works and I get my scatterchart with the dots changing colour depending on the area population.

fig, ax = plt.subplots(figsize=(10, 6))

ax.scatter(x = homicide_scatter_df['Homicide'], 
           y = homicide_scatter_df['Area Name'],
           s = 225,
           c = 'Population/mil', colormap='viridis',
           sharex=False)

However, the code above throws an error regarding c:

ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not Population/mil


Update: After fixing the column issue, there is still an issue with the colour bar.

On the right of the pandas chart I get a colour range bar with the label "Population/mil". The matplotlib version does not present the same colour bar with a range of colours. Is it possible to get the same colour bar using the second method?

Update: I now have the colour bar, but the colours are in the opposite order to the data.

norm = colors.Normalize(homicide_scatter_df['Population/mil'].max(), homicide_scatter_df['Population/mil'].min())
plt.xticks(rotation=90)
fig.colorbar(cm.ScalarMappable(norm=norm,cmap='YlGnBu'), ax=ax)

The above code shows the bar in the correct place and with the correct values. However, the colour is in the opposite order to the dots colour.

How can I change the colours to ascend the correct way?


Solution

  • In the pandas plot, c='Population/mil' works because pandas already knows this is a column of homicide_scatter_df.

    In the matplotlib plot, you need to either pass the full column like you did for x and y:

    ax.scatter(x=homicide_scatter_df['Homicide'], 
               y=homicide_scatter_df['Area Name'],
               c=homicide_scatter_df['Population/mil'], # actual column, not column name
               s=225, colormap='viridis', sharex=False)
    

    Or if you specify the data source, then you can just pass the column names similar to pandas:

    data: If given, parameters can also accept a string s interpreted as data[s].

    Also the colormap handling is different in matplotlib. Change colormap to cmap and call the colorbar manually:

    ax.scatter(data=homicide_scatter_df, # define data source
               x='Homicide',             # column names
               y='Area Name',
               c='Population/mil',
               cmap='viridis',           # colormap -> cmap
               s=225, sharex=False)
    plt.colorbar(label='Population/mil') # manually add colorbar