I'm using numpy
to find the median value from a list of tuples like this:
print(np.median( [x[1] for x in pairs]) )
The Pairs
themselves come from the collections.namedtuple, and individually they look like this:
Pair(hash=u'0x034c9e7f28f136188ebb2a2630c26183b3df90c387490159b411cf7326764341', gas=21000)
Pair(hash=u'0xffda7269775dcd710565c5e0289a2254c195e006f34cafc80c4a3c89f479606e', gas=1000000)
Pair(hash=u'0x90ca439b7daa648fafee829d145adefa1dc17c064f43db77f573da873b641f19', gas=90000)
Pair(hash=u'0x7cba9f140ab0b3ec360e0a55c06f75b51c83b2e97662736523c26259a730007f', gas=40000)
Pair(hash=u'0x92dedff7dab405220c473aefd12e2e41d260d2dff7816c26005f78d92254aba2', gas=21000)
This is the method where I determine the median:
pairs = list(_as_pairs(dict_hash_gas))
# pprint.pprint(pairs)
if pairs:
# Avoid a ValueError from min() and max() if the list is empty.
print(min(pairs, key=lambda pair: pair.gas))
print(max(pairs, key=lambda pair: pair.gas))
print(np.median( [x[1] for x in pairs]) )
Here is how the structure is created:
def _as_pairs(pairs):
for pair in pairs:
# TODO: Verify the dict conatains exactly one item?
for k, v in pair.items():
# Should the `key` string also be an integer?
#yield Pair(key=int(k, base=16), value=int(v))
yield Pair(hash=k, gas=int(v))
The full script can be sound here.
At the moment the output is like this:
Pair(hash=u'0xf4f034e23b4118cb4aa4e9d077f0f28d675e25e9dc2650225f32ac33e04c93aa', gas=21000)
Pair(hash=u'0x92de9056a6357752a46dff1d6ff274d204d450bbd6c51cefe757f199af105cb4', gas=4712388)
90000.0
The question is, how could I output the entire record, the entire Pair
, associated with the median value, as opposed to just the median value itself?
You can get the index of median Pair, but it needs one more lines:
1) If you allways have len(pairs)%2 == 1
, the median is unique and belongs to the pairs:
gases = np.array([pair.gas for pair in pairs])
medianGasIndex = np.where( gases == np.median(gases) )[0][0]
print(pairs[medianGasIndex])
2) If you may have len(pairs)%2 == 0
, then you have to choose:
2.1) Either you want the median Pair that is the nearest value of the real median value (i.e. the 50 percentile, that is not contained in the dataset)
medianGasIndex = np.where( gases == np.percentile(gases,50,interpolation='nearest') )[0][0]
2.2) or you want both the right and left median values
leftMedianGasIndex = np.where( gases == np.percentile(gases,50,interpolation='lower') )[0][0]
rightMedianGasIndex = np.where( gases == np.percentile(gases,50,interpolation='higher') )[0][0]
It works with this minimal working example, just edit the way to get the median value acording to your needs.