Search code examples
pythonnumpymatplotlibhistogramprobability-distribution

Summarize and plot list of ndarrays according to chosen values


I have a list of ndarrays:

list1 = [t1, t2, t3, t4, t5]

Each t consists of:

t1 = np.array([[10,0.1],[30,0.05],[30,0.1],[20,0.1],[10,0.05],[10,0.05],[0,0.5],[20,0.05],[10,0.0]], np.float64)

t2 = np.array([[0,0.05],[0,0.05],[30,0],[10,0.25],[10,0.2],[10,0.25],[20,0.1],[20,0.05],[10,0.05]], np.float64)

...

Now I want for the whole list to get for each t the average of the values corresponding to the first element:

t1out = [[0,0.5],[10,(0.1+0.05+0.05+0)/4],[20,(0.1+0.05)/2],[30,0.075]]

t2out = [[0,0.05],[10,0.1875],[20,0.075],[30,0]]

....

After generating the t_1 ... t_n, I want to plot the probabilities over the classes for each t, where the first elements represent the classes (0,10,20,30) and the second elements show the probabilities of which these classes occurr (0.1,0.7,0.15,0). Something like a histogram or a probability distribution in form of a bar plot like:

plt.bar([classes],[probabilities])

plt.bar([item[0] for item in t1out],[item[1] for item in t1out])

Solution

  • Here's one approach using itertools.groupby:

    from statistics import mean
    from itertools import groupby
    
    def fun(t):
        s = sorted(t, key=lambda x:x[0])
        return [[k, mean(i[1] for i in v)] for k,v in groupby(s, key=lambda x: x[0])]
    
    fun(t1)
    
    [[0.0, 0.5],
     [10.0, 0.05],
     [20.0, 0.07500000000000001],
     [30.0, 0.07500000000000001]]
    

    And to apply to all arrays:

    [fun(t) for t in [t1,t2]]
    
    [[[0.0, 0.5],
      [10.0, 0.05],
      [20.0, 0.07500000000000001],
      [30.0, 0.07500000000000001]],
     [[0.0, 0.05], [10.0, 0.1875], [20.0, 0.07500000000000001], [30.0, 0.0]]]