Search code examples
pythonpandasdataframemulti-index

Exploding pandas dataframe by list to multiple rows with a new column for multiindex


I have a Pandas dataframe where the columns are 'month' and 'year', and a 'value_list' which is a list of values - one for each day of the month. Something like this -

year month value_list
1990 2 [10, 20, 30, 40, ...... 290, 300]
1990 3 [110, 120, 130, 140, ...... 390, 400]

I want to unpack these into multiple rows - one for each day (new column), so that I get something like the following -

year month day value
1990 2 1 10
1990 2 2 20
1990 2 3 30
1990 2 4 40

and so on.

I tried using df.explode() but to no avail since the index gets reset or is set to a new single index. How can I automatically get the dates (essentially creating a year, month, date multiindex) while unpacking the lists?


Solution

  • After you explode, you can create the day sequences using groupby-transform with cumcount:

    df = df.explode('value_list').rename(columns={'value_list': 'value'})
    df['day'] = df.groupby(['year', 'month']).transform('cumcount').add(1)
    
    #    year  month value  day
    # 0  1990      2    10    1
    # 0  1990      2    20    2
    # 0  1990      2    30    3
    # 0  1990      2    40    4
    # 0  1990      2   290    5
    # 0  1990      2   300    6
    # 1  1990      3   110    1
    # 1  1990      3   120    2
    # 1  1990      3   130    3
    # 1  1990      3   140    4
    # 1  1990      3   390    5
    # 1  1990      3   400    6
    

    Also as @wwnde commented, if value_list doesn't contain true lists but just strings that look like lists, convert them to lists before exploding:

    df.value_list = df.value_list.str.strip('[]').str.split(r'\s*,\s*')