Search code examples
pythonpandasplotunique

Pandas DataFrame plot, colors are not unique


According to Pandas manual, the parameter Colormap can be used to select colors from matplotlib colormap object. However for each bar, in the case of a bar diagram, the color needs to be selected manually. This is not capable, if you have a lot of bars, the manual effort is annoying. My expectation is that if no color is selected, each object/class should get a unique color representation. Unfortunately, this is not the case. The colors are repetitive. Only 10 unique colors are provided.

Code for reproduction:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,100,size=(100, 25)), columns=list('ABCDEFGHIJKLMNOPQRSTUVWXY'))
df.set_index('A', inplace=True)
df.plot(kind='bar', stacked=True, figsize=(20, 10))
plt.title("some_name")
plt.savefig("some_name" + '.png')

Does somebody have any idea how to get a unique color for each class in the diagram? Thanks in advance


Solution

  • That's probably because the colors in the default property cycle (see image below) are only number of 10.

    A workaround would be to set a list of random colors (in your case, 24) and pass it as a kwarg to pandas.DataFrame.bar :

    import random
    
    list_colors= ["#"+"".join([random.choice("0123456789ABCDEF") for j in range(6)])
                  for i in range(len(df.columns))]
    
    df.plot(kind="bar", stacked=True, figsize=(20, 10), color=list_colors)
    

    enter image description here

    Update :

    It might be hard to find a palette of very distinct 24 colors. However, you can use one of the palettes available in seaborn :

    enter image description here

    import seaborn as sns #pip install seaborn
    
    list_colors = sns.color_palette("hsv", n_colors=24)
    
    df.plot(kind="bar", stacked=True, figsize=(20, 10), color=list_colors)
    

    Another solution would be to use scipy.spatial.distance.euclidean from the beautiful :

    from scipy.spatial import distance #pip install scipy
    
    def hex_to_rgb(hex_color):
        return tuple(int(hex_color[i:i+2], 16) for i in (1, 3, 5))
    
    def distinct_colors(n):
        colors = []
        while len(colors) < n:
            color = "#" + "".join(random.choice("0123456789ABCDEF") for _ in range(6))
            if all(distance.euclidean(hex_to_rgb(color), hex_to_rgb(c)) > 50 for c in colors):
                colors.append(color)
        return colors
    
    colors = distinct_colors(len(df.columns)) #len(df.columns)=24
    sns.palplot(colors)
    

    enter image description here