I have a Pandas dataframe where the columns are 'month' and 'year', and a 'value_list' which is a list of values - one for each day of the month. Something like this -
year | month | value_list |
---|---|---|
1990 | 2 | [10, 20, 30, 40, ...... 290, 300] |
1990 | 3 | [110, 120, 130, 140, ...... 390, 400] |
I want to unpack these into multiple rows - one for each day (new column), so that I get something like the following -
year | month | day | value |
---|---|---|---|
1990 | 2 | 1 | 10 |
1990 | 2 | 2 | 20 |
1990 | 2 | 3 | 30 |
1990 | 2 | 4 | 40 |
and so on.
I tried using df.explode()
but to no avail since the index gets reset or is set to a new single index. How can I automatically get the dates (essentially creating a year, month, date multiindex) while unpacking the lists?
After you explode
, you can create the day
sequences using groupby-transform
with cumcount
:
df = df.explode('value_list').rename(columns={'value_list': 'value'})
df['day'] = df.groupby(['year', 'month']).transform('cumcount').add(1)
# year month value day
# 0 1990 2 10 1
# 0 1990 2 20 2
# 0 1990 2 30 3
# 0 1990 2 40 4
# 0 1990 2 290 5
# 0 1990 2 300 6
# 1 1990 3 110 1
# 1 1990 3 120 2
# 1 1990 3 130 3
# 1 1990 3 140 4
# 1 1990 3 390 5
# 1 1990 3 400 6
Also as @wwnde commented, if value_list
doesn't contain true lists but just strings that look like lists, convert them to lists before exploding:
df.value_list = df.value_list.str.strip('[]').str.split(r'\s*,\s*')