Python to output record associated with median value of a tuple list, determined by numpy

I'm using numpy to find the median value from a list of tuples like this:

print(np.median( [x[1] for x in pairs]) )

The Pairs themselves come from the collections.namedtuple, and individually they look like this:

Pair(hash=u'0x034c9e7f28f136188ebb2a2630c26183b3df90c387490159b411cf7326764341', gas=21000)
Pair(hash=u'0xffda7269775dcd710565c5e0289a2254c195e006f34cafc80c4a3c89f479606e', gas=1000000)
Pair(hash=u'0x90ca439b7daa648fafee829d145adefa1dc17c064f43db77f573da873b641f19', gas=90000)
Pair(hash=u'0x7cba9f140ab0b3ec360e0a55c06f75b51c83b2e97662736523c26259a730007f', gas=40000)
Pair(hash=u'0x92dedff7dab405220c473aefd12e2e41d260d2dff7816c26005f78d92254aba2', gas=21000)

This is the method where I determine the median:

pairs = list(_as_pairs(dict_hash_gas))
# pprint.pprint(pairs)
if pairs:
    # Avoid a ValueError from min() and max() if the list is empty.
    print(min(pairs, key=lambda pair: pair.gas))
    print(max(pairs, key=lambda pair: pair.gas))
    print(np.median( [x[1] for x in pairs]) )

Here is how the structure is created:

def _as_pairs(pairs):
    for pair in pairs:
        # TODO:  Verify the dict conatains exactly one item?
        for k, v in pair.items():
            # Should the `key` string also be an integer?
            #yield Pair(key=int(k, base=16), value=int(v))
            yield Pair(hash=k, gas=int(v))

The full script can be sound here.

At the moment the output is like this:

Pair(hash=u'0xf4f034e23b4118cb4aa4e9d077f0f28d675e25e9dc2650225f32ac33e04c93aa', gas=21000)
Pair(hash=u'0x92de9056a6357752a46dff1d6ff274d204d450bbd6c51cefe757f199af105cb4', gas=4712388)
90000.0

The question is, how could I output the entire record, the entire Pair, associated with the median value, as opposed to just the median value itself?

Solution

You can get the index of median Pair, but it needs one more lines:

1) If you allways have len(pairs)%2 == 1, the median is unique and belongs to the pairs:

gases = np.array([pair.gas for pair in pairs])
medianGasIndex = np.where( gases == np.median(gases) )[0][0]
print(pairs[medianGasIndex])

2) If you may have len(pairs)%2 == 0, then you have to choose:

2.1) Either you want the median Pair that is the nearest value of the real median value (i.e. the 50 percentile, that is not contained in the dataset)

medianGasIndex = np.where( gases == np.percentile(gases,50,interpolation='nearest') )[0][0]

2.2) or you want both the right and left median values

leftMedianGasIndex = np.where( gases == np.percentile(gases,50,interpolation='lower') )[0][0]
rightMedianGasIndex = np.where( gases == np.percentile(gases,50,interpolation='higher') )[0][0]

It works with this minimal working example, just edit the way to get the median value acording to your needs.