python numpy matplotlib histogram probability-distribution

Summarize and plot list of ndarrays according to chosen values

I have a list of ndarrays:

list1 = [t1, t2, t3, t4, t5]

Each t consists of:

t1 = np.array([[10,0.1],[30,0.05],[30,0.1],[20,0.1],[10,0.05],[10,0.05],[0,0.5],[20,0.05],[10,0.0]], np.float64)

t2 = np.array([[0,0.05],[0,0.05],[30,0],[10,0.25],[10,0.2],[10,0.25],[20,0.1],[20,0.05],[10,0.05]], np.float64)

...

Now I want for the whole list to get for each t the average of the values corresponding to the first element:

t1out = [[0,0.5],[10,(0.1+0.05+0.05+0)/4],[20,(0.1+0.05)/2],[30,0.075]]

t2out = [[0,0.05],[10,0.1875],[20,0.075],[30,0]]

....

After generating the t_1 ... t_n, I want to plot the probabilities over the classes for each t, where the first elements represent the classes (0,10,20,30) and the second elements show the probabilities of which these classes occurr (0.1,0.7,0.15,0). Something like a histogram or a probability distribution in form of a bar plot like:

plt.bar([classes],[probabilities])

plt.bar([item[0] for item in t1out],[item[1] for item in t1out])

Solution

Here's one approach using itertools.groupby:

from statistics import mean
from itertools import groupby

def fun(t):
    s = sorted(t, key=lambda x:x[0])
    return [[k, mean(i[1] for i in v)] for k,v in groupby(s, key=lambda x: x[0])]

fun(t1)

[[0.0, 0.5],
 [10.0, 0.05],
 [20.0, 0.07500000000000001],
 [30.0, 0.07500000000000001]]

And to apply to all arrays:

[fun(t) for t in [t1,t2]]

[[[0.0, 0.5],
  [10.0, 0.05],
  [20.0, 0.07500000000000001],
  [30.0, 0.07500000000000001]],
 [[0.0, 0.05], [10.0, 0.1875], [20.0, 0.07500000000000001], [30.0, 0.0]]]