Search code examples
pythonmatplotlibcolorbar

how to create a discrete colorbar by strings as time and height plot?


The Output of my algorithm gives me a certain string. I need to visualize these in a Time-Height Plot with colors defined by those strings. So far, so good. I convert the strings to categorical and am able to choose my colors freely.

num_hydrometeor = 8
ncar_cmap = cm.get_cmap('gist_ncar_r', num_hydrometeor)

colors = {'AG':'chocolate','IC':'orange','DN':'yellowgreen','OT':'grey','WS':'r','FZ':'rosybrown','RN':'teal','IP':'cyan',np.nan:'white'}

a = np.linspace(0,18,400)
beam_height_test = beam_height_test = np.sort(np.random.choice(a,size=180))
times = pd.date_range('1/1/2020', periods = 288, freq ='5min') 

C = np.array(['WS', 'OT', 'FZ', np.nan, 'AG', 'IC'],dtype=object)
test_dist_hca = np.random.choice(C,size=(len(beam_height_test),len(times)))

test_dist_hca_cat = pd.Series(data=test_dist_hca.flatten()).astype('category')

test_dist_hca_cat = test_dist_hca_cat.cat.codes
test_dist_hca_cat = test_dist_hca_cat.values
test_dist_hca_cat = test_dist_hca_cat.reshape((len(beam_height_test),len(times)))

cols = []
a = pd.Series(data=test_dist_hca.flatten()).sort_values().unique()
for hc in a:
    cols.append(colors[hc])
ncar_cmap = cm.colors.ListedColormap(cols)

levels = np.unique(test_dist_hca_cat)

plt.figure(figsize=(40,10))
plt.pcolormesh(times,beam_height_test,test_dist_hca_cat,cmap=ncar_cmap,norm = cm.colors.BoundaryNorm(levels, ncolors=ncar_cmap.N, clip=False))
plt.colorbar()

plt.savefig("hmc_daily_test.png")

hmc_daily_test.png

If applying to my real output it looks like this:

hmc_daily_test2.png

Does anyone has an idea what I am doing wrong? The Algorithm output comes from an pandas DataFrame and goes the same way as the pandas.Series in the minimal example.


Solution

  • To find out what's happening, I reduced the sizes. I also created a scatter plot where the colors are decided directly from the dictionary without the route via .astype('category').

    It seems the nan complicates things somewhat, because it gets category number -1. Therefore, it needs to be treated separated from the rest, and we need the ranges for the colors starting with -1.

    To get the ticks for the colorbar exactly in the center of each color, its range (-1 to 4 in this case) is divided into 12 equal parts, after which every even tick is skipped.

    Here is how the final test code looks like:

    from matplotlib import pyplot as plt
    from matplotlib import cm
    import pandas as pd
    import numpy as np
    
    colors = {'AG': 'chocolate', 'IC': 'orange', 'DN': 'yellowgreen', 'OT': 'grey', 'WS': 'r', 'FZ': 'rosybrown',
              'RN': 'teal', 'IP': 'cyan', np.nan: 'white'}
    
    a = np.linspace(0, 18, 25)
    beam_height_test = np.sort(np.random.choice(a, replace=False, size=10))
    times = pd.date_range('1/1/2020', periods=12, freq='5min')
    
    C = np.array(['WS', 'OT', 'FZ', np.nan, 'AG', 'IC'], dtype=object)
    test_dist_hca = np.random.choice(C, size=(len(beam_height_test), len(times)))
    
    plt.figure(figsize=(14, 7))
    plt.scatter(np.tile(times, len(beam_height_test)),
                np.repeat(beam_height_test, len(times)),
                c=[colors[h] for h in test_dist_hca.flatten()])
    for i, x in enumerate(times):
        for j, y in enumerate(beam_height_test):
            plt.text(x, y, test_dist_hca[j][i])
    plt.show()
    
    
    test_dist_hca_cat = pd.Series(data=test_dist_hca.flatten()).astype('category')
    test_dist_hca_cat = test_dist_hca_cat.cat.codes
    test_dist_hca_cat = test_dist_hca_cat.values
    test_dist_hca_cat = test_dist_hca_cat.reshape((len(beam_height_test), len(times)))
    
    used_colors = [colors[np.nan]]
    a = pd.Series(data=test_dist_hca.flatten()).sort_values().unique()
    for hc in a:
        if type(hc) == str:
            used_colors.append(colors[hc])
    cmap = cm.colors.ListedColormap(used_colors)
    
    plt.figure(figsize=(14, 7))
    plt.pcolormesh(times, beam_height_test, test_dist_hca_cat,
                   cmap=cmap,
                   norm=plt.Normalize(vmin=-1, vmax=len(a) - 2))
    cbar = plt.colorbar(ticks=np.linspace(-1, len(a) - 2, 2 * len(a), endpoint=False)[1::2])
    cbar.ax.set_yticklabels(['nan'] + list(a[:-1]))
    
    plt.show()
    

    Here is how the pcolormesh with the color bar look like:

    resulting plot

    And the corresponding scatter plot with the text annotations:

    scatter plot

    Note that the colors and the names correspond. As explained in the pcolormesh docs, pcolormesh ignores the last row and column when the X and Y sizes aren't 1 larger than the mesh.