Search code examples
matplotlibhistogramdensity-plotnormalize

2d scatter plot - mcolors.Normalize vs mcolors.LogNorm


I have a scatter plot that I'm making but when I switch between mcolors.Normalize and mcolors.LogNorm the color bar is not consistent between the two figures - I would think that the normalized color bar ticks would be approximately the same as the log-normalized ticks (eg at major intervals of 10^1, 10^2, 10^3, etc). Is this not the case? In other words - are the color bars giving me the same answer? Thanks in advance!

import matplotlib.pyplot as plt  # v3.5.2
import seaborn as sns            # v0.11.2
import numpy as np               # v1.22.1
import matplotlib.colors as mcolors

# generate data
x = np.random.normal(size=100000)
y = x * 3 + np.random.normal(size=100000)

def scatter2d(x, y, norm=mcolors.LogNorm):
    """Create a plt.2dhist with marginal histogram."""
    ax1 = sns.jointplot(x=x, y=y, marginal_kws={'bins' : 50})
    ax1.fig.set_size_inches(5, 4)
    ax1.ax_joint.cla()
    plt.sca(ax1.ax_joint)
    plt.hist2d(x, y, 50, norm=norm(), cmin=1,
               cmap='plasma', range=None )

    # set up scale bar legend
    cbar_ax = ax1.fig.add_axes([1, 0.1, 0.03, 0.7])
    cb = plt.colorbar(cax=cbar_ax)
    cb.set_label(f'Density of points ({norm.__name__})', fontsize=13)
    
    pass

scatter2d(x, y)


scatter2d(x, y, norm=mcolors.Normalize)

enter image description here enter image description here


Solution

  • Here is a visualization of what the norm is doing.

    I simplified your code a bit, as Seaborn isn't really used for the specific issue (and you seem to be running extremely old versions, although the main principles stay the same). A fixed seed helps with reproducibility.

    Setting the same ticks on the colorbars, shows how the log norm stretches out the lower values, and shrinks the space for the higher values. As such, it shows more detail for the 2D histogram. The curves at the bottom show this transformation on the y-axis as a function of the input values (in this case the heights of the 2d bins).

    Note that these norms are a bit unusual functions, as they internally update a minimum and maximum value (vmin and vmax) when they are run the first time.

    import matplotlib.pyplot as plt  # v3.8.3
    import numpy as np  # v1.25.2
    import matplotlib.colors as mcolors
    import matplotlib.ticker as mticker
    
    # generate data
    np.random.seed(123)
    x = np.random.normal(size=100000)
    y = x * 3 + np.random.normal(size=100000)
    
    fig, axs = plt.subplots(2, 2, figsize=(18, 10))
    
    for norm, ax0, ax1 in zip([mcolors.LogNorm, mcolors.Normalize], axs[0], axs[1]):
        norm_func = norm()
        hist_vals, _, _, hist_img = ax0.hist2d(x, y, 50, norm=norm_func, cmin=1, cmap='plasma')
        ax0.set_title("Color transformation via " + norm.__name__)
        cb = fig.colorbar(hist_img, ax=ax0)
        cb.ax.yaxis.set_major_locator(mticker.MultipleLocator(200))
        cb.ax.yaxis.set_major_formatter(mticker.ScalarFormatter())
        cb.ax.yaxis.set_minor_locator(mticker.NullLocator())
        cb.ax.set_ylabel("Norm: " + norm.__name__)
    
        nx = np.linspace(1, np.nanmax(hist_vals), 1000)
        ax1.set_title("Transformation via " + norm.__name__)
        ax1.plot(nx, norm_func(nx))
        ax1.scatter(nx[::20], norm_func(nx[::20]), c=nx[::20], norm=norm_func, cmap='plasma', s=20)
        ax1.set_ylabel(norm.__name__)
        ax1.set_xlabel("input value")
    
    plt.tight_layout()
    plt.show()
    

    LogNorm vs Normalize

    Here is a different visualization (with different input), showing the count of each cell. Fewer subdivisions allow the text to fit into the cells. The log norm has much more varied colors.

    import matplotlib.pyplot as plt  # v3.8.3
    import numpy as np  # v1.25.2
    import matplotlib.colors as mcolors
    
    # generate data
    np.random.seed(12345)
    x = np.random.normal(size=10000).cumsum()
    y = np.random.normal(size=x.size).cumsum()
    
    fig, axs = plt.subplots(1, 2, figsize=(18, 5))
    
    for norm, ax in zip([mcolors.LogNorm, mcolors.Normalize], axs):
        norm_func = norm()
        hist_vals, xedges, yedges, hist_img = ax.hist2d(x, y, 15, norm=norm_func, cmin=1, cmap='plasma')
        for i in range(hist_vals.shape[0]):
            for j in range(hist_vals.shape[1]):
                if hist_vals[i, j] > 0:
                    color = 'black' if norm_func(hist_vals[i, j]) > 0.6 else 'white'
                    ax.text((xedges[i] + xedges[i + 1]) / 2, (yedges[j] + yedges[j + 1]) / 2, int(hist_vals[i, j]),
                            color=color, ha='center', va='center')
        ax.set_title("Color transformation via " + norm.__name__)
        cb = fig.colorbar(hist_img, ax=ax)
        cb.ax.set_ylabel("Norm: " + norm.__name__)
    
    plt.tight_layout()
    plt.show()
    

    LogNorm vs Normalize with counts