Search code examples
pythonpandasdataframeseries

Expand number of dataframe rows based on sample count values


I have a pandas dataframe with a column for each day of the week, and a column that counts the occurrences of some event:

# initialize data of lists.
data = {'Day': ['M', 'T', 'W', 'Th', 'F', 'Sa', 'Su'],
        'Count': [1, 0, 3, 1, 2, 4, 2]}
  
# Create DataFrame
df = pd.DataFrame(data)
print(df)

outputs:

  Day  Count
0   M      1
1   T      0
2   W      3
3  Th      1
4   F      2
5  Sa      4
6  Su      2

I want to create a Series where the number of rows is equal to the sum of the count column above. The series will have one row for every instance an event took place during a given day. So if an event took place 3 times on Wednesday and 1 time on Thursday, there would be 3 W rows and 1 Th row. This is my desired output:

   Day
0    M
1    W
2    W
3    W
4   Th
5    F
6    F
7   Sa
8   Sa
9   Sa
10  Sa
11  Su
12  Su

How can I achieve this?


Solution

  • Here is a way to do this kind of transformation by using pandas.DataFrame.explode :

    out = (df.assign(Count=df['Count'].apply(lambda x: range(1, x+1)))
           .explode('Count', ignore_index=False).dropna()).reset_index()['Day'].to_frame()
    

    >>> print(out)

    enter image description here