Search code examples
pythonarraysnumpymedian

Calculating conditional medians of a numpy array


I have a numpy array X of the form

[ A 1 a1 ]
[ A 2 a2 ]
[ A 3 a3 ]
[ B 1 b1 ]
[ B 2 b2 ]
[ B 3 b3 ]
[ B 4 b4 ]
[ C 1 c1 ]
[ C 2 c2 ]
[ C 3 c3 ]
[ C 4 c4 ]
[ C 5 c5 ]

where

  • (A, B, C) correspond to different experiment setups,
  • (1, 2, 3, ...) correspond to independent replications of the experiment, and
  • (a1, b2, etc.) correspond to the measurements made at each replication

I need to reduce this array to include summaries of what happened with each setup of the experiment, e.g. outputting the array

[ A median(a1, a2, a3) ]
[ B median(b1, b2, b3, b4) ]
[ C median(c1, c2, c3, c4, c5) ]

I would like to do this without having to indicate

i) how many different experiment setups there were, and

ii) how many replications of each experiment were performed.

I suspect this should be possible with some sort of masking, e.g. something like median(X[:,2] such that X[:,0] = a), iterating over a in some way, but I'm not sure of the syntax for doing so.


Solution

  • import numpy as np
    
    experiments = np.unique(X[:,0])
    medians = []
    for experiment in experiments:
        l = [experiment]
        l.append(np.median(X[X[:,0]==experiment, 2]))
        medians.append(l)
    medians = np.array(medians)