Search code examples
pythonpandassumuniquemedian

Pandas: combining duplicate index values


I have a pandas series that I would like to combine in three different ways. The series is as follows:

import pandas as pd
timestamps = [1,1,1,2,3,3,3,4]
quantities = [10,0,2,6,7,2,8,0]
series = pd.Series(quantities, index=timestamps)

Clearly the timestamps have 3 values of 1, 1 value of 2, 3 values of 3 and 1 value of 1. I would like to generate the following series:

1. Sum of the duplicate index values:

pd.Series([12,6,17,0], index=[1,2,3,4])

2. Median of the duplicate index values:

pd.Series([2,6,7,0], index=[1,2,3,4])

2. The number of duplicate index values:

pd.Series([3,1,3,1], index=[1,2,3,4])

In numpy I would achieve this using a unique_elements_to_indices method:

from typing import Dict
import numpy as np
def unique_elements_to_indices(array: np.array) -> Dict:
    mapping = {}
    for unique_element in np.unique(array):
        mapping[unique_element] = np.where(array == unique_element)[0]
    return mapping

... and then I would loop through the unique_elements and use np.where to locate the quantities for that given unique_element.

Is there away to achieve this quickly in pandas, please?

Thanks.


Solution

  • Here is possible use functions sum, median for separate outputs with parameter level=0 for aggregate by index:

    print (series.sum(level=0))
    print (series.median(level=0))
    

    But generaly aggregate by index with function:

    print (series.groupby(level=0).sum())
    print (series.groupby(level=0).median())
    #difference between count and size is count exclude NaNs values
    print (series.groupby(level=0).size())
    
    print (series.groupby(level=0).count())
    

    If need all together to new DataFrame use GroupBy.agg with list of aggregate functions:

    print(series.groupby(level=0).agg(['sum', 'median', 'size']))