Say I have a pandas series, and I want to take the mean of every set of 8 rows. I don't have prior knowledge of the size of the series, and the index may not be 0-based. I currently have the following
N = 8
s = pd.Series(np.random.random(50 * N))
n_sets = s.shape[0] // N
split = ([m * N for m in range(n_sets)],
[m * N for m in range(1, n_sets + 1)])
out_array = np.zeros(n_sets)
for i, (a, b) in enumerate(zip(*split)):
out_array[i] = s.loc[s.index[a:b]].mean()
Is there a shorter way to do this?
You could try with groupby
, by slicing the index in N
(you can see here an explanation of the slicing), and then use pd.Series.mean()
:
newout_array=s.groupby(s.index//N).mean().to_list()
Output:
out_array #original solution
[0.42147899 0.55668055 0.5222594 0.46066426 0.44378491 0.52719371
0.42479113 0.46485387 0.2800083 0.57174865 0.59207811 0.58665479
0.52414851 0.38158931 0.51884761 0.59007469 0.3449512 0.56385373
0.34359674 0.44524997 0.44175351 0.42339394 0.5687501 0.3140091
0.40985639 0.46649486 0.3101396 0.45664647 0.51829052 0.38875796
0.45428001 0.52979064 0.62545921 0.64782618 0.65265239 0.56976799
0.64277369 0.33528876 0.45973874 0.45341751 0.52690983 0.66427599
0.59814577 0.35575622 0.62995929 0.61582329 0.38971679 0.4771326
0.50889137 0.25105353]
newout_array #new solution
[0.4214789945860148, 0.5566805507021909, 0.5222593998859411, 0.46066425607167216, 0.4437849132421554, 0.5271937114894408,
0.424791134573943, 0.4648538659945887, 0.28000829556024387, 0.5717486453029332, 0.5920781058695997, 0.5866547941460012,
0.5241485100329547, 0.38158931177460725, 0.5188476113762392, 0.5900746905953183, 0.34495119855714756, 0.5638537286251522,
0.3435967359945349, 0.44524997190104454, 0.44175351484451975, 0.42339393886425913, 0.5687501027416468, 0.3140090963728155,
0.40985639015924036, 0.4664948621046134, 0.3101396034068746, 0.45664647332866076, 0.5182905157666298, 0.38875796468438406,
0.4542800111275337, 0.5297906368971982, 0.6254592119278896, 0.6478261817988752, 0.6526523935382951, 0.569767994485338,
0.642773691835847, 0.3352887578683835, 0.45973873832126594, 0.45341751320112617, 0.5269098312525405, 0.6642759923683706,
0.5981457683986061, 0.3557562229383897, 0.6299592930489117, 0.6158232897272005, 0.38971678834383916, 0.4771325988592886,
0.5088913710936904, 0.25105352820427246]
The difference it's because the number of decimals of each format, if you want to have only 8 decimals as the original out_array
, you could try to map
the elements with round
function:
newout_array=s.groupby(s.index//N).mean().to_list()
newout_array=list(map(lambda x: round(x,8),newout_array))