Search code examples
pythonpandasnumpymatplotlibseaborn

Finding distribution of data by bins in matplotlib?


I have data stored in an array that I want to turn into a heatmap of distributions. Each column in the array represents a correlation constant used to generate data and the number of rows are the number of trials per correlation constant. Each value in the array is a decimal from 0 to 1 and I want to find the distribution of all the data across 20 or so bins for each correlation. From what I've tried heatmaps can only display single values in arrays so I want a way to get distributions for each bin and make heatmaps for each correlation. I've already represented this data in a histogram but I also want to store it in a heatmap for visual purposes. Is there a way to get distribution data on an array so I can store it in a heatmap?

array representation of data for reference:

correls = m
repeats = n
data1 = np.zeros(len(repeats), len(correls))
#want to make a a heatmap that has distributions for each correl from 30 bins between 0 and 1

Solution

  • If I understand your question correctly, I think this is what you're looking for

    import numpy as np
    
    correlations = 10
    trials = 1000
    
    ## Create array of test data
    data = np.zeros((trials, correlations))
    for i in range(correlations):
        data[:,i] = np.random.uniform(low=0., high=1., size=trials)
        
    ## Create histogram values, store in a new array
    num_bins = 30
    hist_arr = np.zeros((num_bins, correlations))
    bin_edges = np.linspace(0, 1, num_bins+1)
    for i in range(correlations):
        tmp_hist, _ = np.histogram(data[:,i], bins=bin_edges, range=(0., 1.))
        hist_arr[:,i] = tmp_hist
        
    plt.figure(figsize=(4,4))
    plt.imshow(hist_arr, cmap="turbo", aspect="auto")
    plt.gca().set(xlabel="Correlations",
                 ylabel="Bins")
    plt.colorbar()
    plt.show()
    

    enter image description here