I have a pandas series that I would like to combine in three different ways. The series is as follows:
import pandas as pd
timestamps = [1,1,1,2,3,3,3,4]
quantities = [10,0,2,6,7,2,8,0]
series = pd.Series(quantities, index=timestamps)
Clearly the timestamps have 3 values of 1
, 1 value of 2
, 3 values of 3
and 1 value of 1
. I would like to generate the following series:
1. Sum of the duplicate index values:
pd.Series([12,6,17,0], index=[1,2,3,4])
2. Median of the duplicate index values:
pd.Series([2,6,7,0], index=[1,2,3,4])
2. The number of duplicate index values:
pd.Series([3,1,3,1], index=[1,2,3,4])
In numpy
I would achieve this using a unique_elements_to_indices
method:
from typing import Dict
import numpy as np
def unique_elements_to_indices(array: np.array) -> Dict:
mapping = {}
for unique_element in np.unique(array):
mapping[unique_element] = np.where(array == unique_element)[0]
return mapping
... and then I would loop through the unique_elements
and use np.where
to locate the quantities
for that given unique_element
.
Is there away to achieve this quickly in pandas, please?
Thanks.
Here is possible use functions sum
, median
for separate outputs with parameter level=0
for aggregate by index:
print (series.sum(level=0))
print (series.median(level=0))
But generaly aggregate by index with function:
print (series.groupby(level=0).sum())
print (series.groupby(level=0).median())
#difference between count and size is count exclude NaNs values
print (series.groupby(level=0).size())
print (series.groupby(level=0).count())
If need all together to new DataFrame use GroupBy.agg
with list of aggregate functions:
print(series.groupby(level=0).agg(['sum', 'median', 'size']))