how to create a discrete colorbar by strings as time and height plot?

The Output of my algorithm gives me a certain string. I need to visualize these in a Time-Height Plot with colors defined by those strings. So far, so good. I convert the strings to categorical and am able to choose my colors freely.

num_hydrometeor = 8
ncar_cmap = cm.get_cmap('gist_ncar_r', num_hydrometeor)

colors = {'AG':'chocolate','IC':'orange','DN':'yellowgreen','OT':'grey','WS':'r','FZ':'rosybrown','RN':'teal','IP':'cyan',np.nan:'white'}

a = np.linspace(0,18,400)
beam_height_test = beam_height_test = np.sort(np.random.choice(a,size=180))
times = pd.date_range('1/1/2020', periods = 288, freq ='5min') 

C = np.array(['WS', 'OT', 'FZ', np.nan, 'AG', 'IC'],dtype=object)
test_dist_hca = np.random.choice(C,size=(len(beam_height_test),len(times)))

test_dist_hca_cat = pd.Series(data=test_dist_hca.flatten()).astype('category')

test_dist_hca_cat = test_dist_hca_cat.cat.codes
test_dist_hca_cat = test_dist_hca_cat.values
test_dist_hca_cat = test_dist_hca_cat.reshape((len(beam_height_test),len(times)))

cols = []
a = pd.Series(data=test_dist_hca.flatten()).sort_values().unique()
for hc in a:
    cols.append(colors[hc])
ncar_cmap = cm.colors.ListedColormap(cols)

levels = np.unique(test_dist_hca_cat)

plt.figure(figsize=(40,10))
plt.pcolormesh(times,beam_height_test,test_dist_hca_cat,cmap=ncar_cmap,norm = cm.colors.BoundaryNorm(levels, ncolors=ncar_cmap.N, clip=False))
plt.colorbar()

plt.savefig("hmc_daily_test.png")

If applying to my real output it looks like this:

Does anyone has an idea what I am doing wrong? The Algorithm output comes from an pandas DataFrame and goes the same way as the pandas.Series in the minimal example.

Solution

To find out what's happening, I reduced the sizes. I also created a scatter plot where the colors are decided directly from the dictionary without the route via .astype('category').

It seems the nan complicates things somewhat, because it gets category number -1. Therefore, it needs to be treated separated from the rest, and we need the ranges for the colors starting with -1.

To get the ticks for the colorbar exactly in the center of each color, its range (-1 to 4 in this case) is divided into 12 equal parts, after which every even tick is skipped.

Here is how the final test code looks like:

from matplotlib import pyplot as plt
from matplotlib import cm
import pandas as pd
import numpy as np

colors = {'AG': 'chocolate', 'IC': 'orange', 'DN': 'yellowgreen', 'OT': 'grey', 'WS': 'r', 'FZ': 'rosybrown',
          'RN': 'teal', 'IP': 'cyan', np.nan: 'white'}

a = np.linspace(0, 18, 25)
beam_height_test = np.sort(np.random.choice(a, replace=False, size=10))
times = pd.date_range('1/1/2020', periods=12, freq='5min')

C = np.array(['WS', 'OT', 'FZ', np.nan, 'AG', 'IC'], dtype=object)
test_dist_hca = np.random.choice(C, size=(len(beam_height_test), len(times)))

plt.figure(figsize=(14, 7))
plt.scatter(np.tile(times, len(beam_height_test)),
            np.repeat(beam_height_test, len(times)),
            c=[colors[h] for h in test_dist_hca.flatten()])
for i, x in enumerate(times):
    for j, y in enumerate(beam_height_test):
        plt.text(x, y, test_dist_hca[j][i])
plt.show()


test_dist_hca_cat = pd.Series(data=test_dist_hca.flatten()).astype('category')
test_dist_hca_cat = test_dist_hca_cat.cat.codes
test_dist_hca_cat = test_dist_hca_cat.values
test_dist_hca_cat = test_dist_hca_cat.reshape((len(beam_height_test), len(times)))

used_colors = [colors[np.nan]]
a = pd.Series(data=test_dist_hca.flatten()).sort_values().unique()
for hc in a:
    if type(hc) == str:
        used_colors.append(colors[hc])
cmap = cm.colors.ListedColormap(used_colors)

plt.figure(figsize=(14, 7))
plt.pcolormesh(times, beam_height_test, test_dist_hca_cat,
               cmap=cmap,
               norm=plt.Normalize(vmin=-1, vmax=len(a) - 2))
cbar = plt.colorbar(ticks=np.linspace(-1, len(a) - 2, 2 * len(a), endpoint=False)[1::2])
cbar.ax.set_yticklabels(['nan'] + list(a[:-1]))

plt.show()

Here is how the pcolormesh with the color bar look like:

And the corresponding scatter plot with the text annotations:

Note that the colors and the names correspond. As explained in the pcolormesh docs, pcolormesh ignores the last row and column when the X and Y sizes aren't 1 larger than the mesh.