I have a dataframe that looks like this :
+------+------------+-------+--------------+
| name | date | value | replacement |
+------+------------+-------+--------------+
| A | 20/11/2016 | 10 | NaN |
| C | 20/11/2016 | 8 | [A,B] |
| B | 20/11/2016 | 12 | NaN |
| E | 25/12/2016 | 16 | NaN |
| F | 25/12/2016 | 18 | NaN |
| D | 25/12/2016 | 11 | [E,F] |
+------+------------+-------+--------------+
What I would like to do :
For each row that has a list of names in the column 'replacement', I would like its 'value' to be distributed equally over the rows containing those replacements + for the same date.
For the previous example, the output would look like this :
+------+------------+-------+------------------+
| name | date | value | additional value |
+------+------------+-------+------------------+
| A | 20/11/2016 | 10 | 4 |
| B | 20/11/2016 | 12 | 4 |
| A | 25/12/2016 | 16 | 5.5 |
| B | 25/12/2016 | 18 | 5.5 |
+------+------------+-------+------------------+
I managed to find a way to perform the distribution directly without creating a new column by splitting those rows and grouping by name + date but 1/ It was too slow + 2/I do need to create that additional column and can't find a way to do so.
Idea is create new column by lenghts of replacement
lists with Series.str.len
and then DataFrame.explode
(pandas 0.25+) them to scalars. Divide columns value
by new
and merge
by original with another columns names for add original columns:
df1 = df.assign(new=df['replacement'].str.len()).explode('replacement')
df1['new'] = df1['value'].div(df1['new'])
df1 = df1[['name','date','value']].merge(df1[['replacement','date','new']],
left_on=['name','date'],
right_on=['replacement','date'])
df1['replacement'] = df1.pop('new')
print (df1)
name date value replacement
0 A 20/11/2016 10 4.0
1 B 20/11/2016 12 4.0
2 A 25/12/2016 16 5.5
3 B 25/12/2016 18 5.5
Similar solution with droping instead selecting:
df1 = df.assign(new=df['replacement'].str.len()).explode('replacement')
df1['new'] = df1['value'].div(df1['new'])
df1 = df1.drop(['replacement','new'],1).merge(df1.drop(['name','value'],1),
left_on=['name','date'],
right_on=['replacement','date'])
df1['replacement'] = df1.pop('new')
print (df1)
name date value replacement
0 A 20/11/2016 10 4.0
1 B 20/11/2016 12 4.0
2 A 25/12/2016 16 5.5
3 B 25/12/2016 18 5.5