According to matplotlib's colormap documentation, the lightness values of their default colormaps are not constant. However, I would like to create a colormap from the HSL color space that has a constant lightness. How can I do that?
I get that generally, it's not that hard to create your own colormaps, but I don't know how to do this while satisfying the lightness criterion. Maybe this can be done by reverse-engineering the code from the colormap documentation?
I think I found a way to do that, based on this post. First of all, working in the HSL color space turned out to be not the best idea for my overal goal, so I switched to HSV instead. With that, I can load the preferred colormap from matplotlib, create a set of RGB colors from it, transform them into HSV, set their color value constant, transform them back into RGB and finally create a colormap from them again (which I can then use for a 2d histogram e.g.).
I need a colormap in HSV with a constant color value because then I can uniquely map colors to the RGB space from the pallet that is spanned by hue and saturation. This in turn would allow me to create a 2d histogram where I could color-code both the counts (via the saturation) and a third variable (via the hue).
In the MWE below for example (slightly changed from here), with a colormap with constant color value, in each bin I could use the saturation to indicate the number of counts (e.g. the lighter the color, the lower the number), and use the hue to indicate the the average z
value. This would allow me to essentially combine the two plots below into one. (There is also this tutorial on adding alpha values to a 2d histogram, but this wouldn't work in this case I think.)
Currently, you still need both plots to get the full picture, because without the histogram for example, you wouldn't be able to tell how significant a certain z
value in a bin might be, as the same color is used independently of how many data points contributed to it (so judging by the color, a bin with only one data point might look just as significant as a bin with the same color but that contains many more data points; thus there is a bias in favor of outliers).
import matplotlib.pyplot as plt
import numpy as np
# make data: correlated + noise
n = 1000
x, y = np.random.uniform(-2, 2, (2, n))
z = np.sqrt(x**2 + y**2) + np.random.uniform(0, 1, n)
bins = 20
fig, axs = plt.subplots(1, 2, figsize=(7, 3), constrained_layout=True)
_, _, _, img = axs[0].hist2d(x, y, bins=bins)
fig.colorbar(img, ax=axs[0])
axs[0].set(xlabel='x', ylabel='y', title='histogram')
sums, xbins, ybins = np.histogram2d(x, y, bins=bins, weights=z)
counts, _, _ = np.histogram2d(x, y, bins=bins)
with np.errstate(divide='ignore', invalid='ignore'):
# suppress possible divide-by-zero warnings
img = axs[1].pcolormesh(xbins, ybins, sums / counts, cmap='inferno')
fig.colorbar(img, ax=axs[1], label='z')
axs[1].set(xlabel='x', ylabel='y', title='weighed by z')
fig.show()
Now that I managed to find a way to create colormaps with constant color value, what remains is figuring out how to have the 2d histogram drawing from a 2d colormap. Since 2d histograms create an instance of a QuadMesh
, and apparently you can set its facecolors, maybe that is a way to go about it, but I haven't figured out how. Below is my implementation of creating the 2d colormap at least:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import cm
from matplotlib.colors import hsv_to_rgb, rgb_to_hsv, ListedColormap
# make data: correlated + noise
np.random.seed(100)
n = 1000
x, y = np.random.uniform(-2, 2, (2, n))
z = np.sqrt(x**2 + y**2) + np.random.uniform(0, 1, n)
bins = 20
fig, axs = plt.subplots(1, 3, figsize=(8, 3), constrained_layout=True)
_, _, _, img = axs[0].hist2d(x, y, bins=bins)
fig.colorbar(img, ax=axs[0], label='N')
axs[0].set(xlabel='x', ylabel='y', title='histogram')
# creating the colormap
inferno = cm.get_cmap('inferno')
hsv_inferno = rgb_to_hsv(inferno(np.linspace(0, 1, 300))[:, :3])
hsv_inferno[:, 2] = 1
rgb_inferno = hsv_to_rgb(hsv_inferno)
# plotting the data
sums, xbins, ybins = np.histogram2d(x, y, bins=bins, weights=z)
counts, _, _ = np.histogram2d(x, y, bins=bins)
with np.errstate(divide='ignore', invalid='ignore'):
# suppress possible divide-by-zero warnings
img = axs[1].pcolormesh(
xbins, ybins, sums / counts, cmap=ListedColormap(rgb_inferno)
)
axs[1].set(xlabel='x', ylabel='y', title='weighed by z')
# adding the custom colorbar
S, H = np.mgrid[0:1:100j, 0:1:300j]
V = np.ones_like(S)
HSV = np.dstack((H, S, V))
HSV[:, :, 0] = hsv_inferno[:, 0]
# HSV[:, :, 2] = hsv_inferno[:, 2]
RGB = hsv_to_rgb(HSV)
z_min, z_max = np.min(img.get_array()), np.max(img.get_array())
c_min, c_max = np.min(counts), np.max(counts)
axs[2].imshow(
np.rot90(RGB), origin='lower', extent=[c_min, c_max, z_min, z_max],
aspect=14
)
axs[2].set_xlabel("N")
axs[2].set_ylabel("z")
axs[2].yaxis.set_label_position("right")
axs[2].yaxis.tick_right()
# readjusting the axes a bit
fig.show() # necessary to get the proper positions
pos = axs[1].get_position()
pos.x0 += 0.065
pos.x1 += 0.065
axs[1].set_position(pos)
fig.show()
What comes to my mind is to interpolate in the 2D colorspace you already defined. Running the following code after your last example with n=100000
for smoother images.
from scipy import interpolate
z = np.divide(sums, counts, where=counts != 0);
points = np.mgrid[
0:np.max(counts):1j*RGB.shape[0], # use counts for the first axis
0:np.max(z):1j*RGB.shape[1], # use sum in for the second axis
]
# arrange points in a N x 2 array
points = np.stack(points, axis=2).reshape(-1, 2)
# arrange the colors in a N x 3 array
values = RGB.reshape(-1, 3) # use your 2D colormap as values
# Creates an interpolator from (..., 2) to (..., 3)
cmap2d = interpolate.LinearNDInterpolator(
points, values
)
# stack counts and sums in an array of (n1, n2, 2)
cpoints = np.stack([counts, z], axis=2)
# gets an (n1, n2, 3) array
img = cmap2d(cpoints)
# plot the img as a RGB image
plt.imshow(img, extent=[xbins[0], xbins[-1], ybins[0], ybins[-1]])
For logarithmic scale you apply the logarithm to the limits but use equally space grid. When interpolating you use the logarithm of the coordinate.
from scipy import interpolate
z = np.divide(sums, counts, where=counts != 0);
points = np.mgrid[
# apply log to the limits from 1/e to max(count)
-1:np.log(np.max(counts)):1j*RGB.shape[0], # use counts for the first axis
0:np.max(z):1j*RGB.shape[1], # use sum in for the second axis
]
# arrange points in a N x 2 array
points = np.stack(points, axis=2).reshape(-1, 2)
# arrange the colors in a N x 3 array
values = RGB.reshape(-1, 3) # use your 2D colormap as values
# Creates an interpolator from (..., 2) to (..., 3)
cmap2d = interpolate.LinearNDInterpolator(
points, values
)
# stack counts and sums in an array of (n1, n2, 2)
# apply log to the values
cpoints = np.stack([np.log(np.maximum(counts, 1)), z], axis=2)
# gets an (n1, n2, 3) array
img = cmap2d(cpoints)
# plot the img as a RGB image
plt.imshow(img, extent=[xbins[0], xbins[-1], ybins[0], ybins[-1]])