So I have 2 list of dicts..
list_yearly = [
{'name':'john',
'total_year': 107
},
{'name':'cathy',
'total_year':124
},
]
list_monthly = [
{'name':'john',
'month':'Jan',
'total_month': 34
},
{'name':'cathy',
'month':'Jan',
'total_month':78
},
{'name':'john',
'month':'Feb',
'total_month': 73
},
{'name':'cathy',
'month':'Feb',
'total_month':46
},
]
The goal is to get a final dataset which looks like this :
{'name':'john',
'total_year': 107,
'trend':[{'month':'Jan', 'total_month': 34},{'month':'Feb', 'total_month': 73}]
},
{'name':'cathy',
'total_year':124,
'trend':[{'month':'Jan', 'total_month': 78},{'month':'Feb', 'total_month': 46}]
},
Since my dataset is for a large number of students for all the 12 months of a particular year,I am using Pandas for the data munging..This is how I went about :
First combine both the lists into a single dataframe using the name key.
In [5]: df = pd.DataFrame(list_yearly).merge(pd.DataFrame(list_monthly))
In [6]: df
Out[6]:
name total_year month total_month
0 john 107 Jan 34
1 john 107 Feb 73
2 cathy 124 Jan 78
3 cathy 124 Feb 46
Then create a trend column as a dict
ln [7]: df['trend'] = df.apply(lambda x: [x[['month', 'total_month']].to_dict()], axis=1)
In [8]: df
Out[8]:
name total_year month total_month \
0 john 107 Jan 34
1 john 107 Feb 73
2 cathy 124 Jan 78
3 cathy 124 Feb 46
trend
0 [{u'total_month': 34, u'month': u'Jan'}]
1 [{u'total_month': 73, u'month': u'Feb'}]
2 [{u'total_month': 78, u'month': u'Jan'}]
3 [{u'total_month': 46, u'month': u'Feb'}]
And, use to_dict(orient='records')
method of selected columns to convert it back into list of dicts:
In [9]: df[['name', 'total_year', 'trend']].to_dict(orient='records')
Out[9]:
[{'name': 'john',
'total_year': 107,
'trend': [{'month': 'Jan', 'total_month': 34}]},
{'name': 'john',
'total_year': 107,
'trend': [{'month': 'Feb', 'total_month': 73}]},
{'name': 'cathy',
'total_year': 124,
'trend': [{'month': 'Jan', 'total_month': 78}]},
{'name': 'cathy',
'total_year': 124,
'trend': [{'month': 'Feb', 'total_month': 46}]}]
As is evident,the final dataset is not exactly what I want.Instead of the 2 dicts with both the months in it,I instead get 4 dicts with all the months separate.How can i fix this ? I would prefer fixing it within Pandas itself rather than using this final output to again reduce it to the desired state
You should actually use groupby
to group based on name
and total_year
instead of apply
(as second step) and in the groupby you can create the list you want. Example -
df = pd.DataFrame(list_yearly).merge(pd.DataFrame(list_monthly))
def func(group):
result = []
for idx, row in group.iterrows():
result.append({'month':row['month'],'total_month':row['total_month']})
return result
result = df.groupby(['name','total_year']).apply(func).reset_index()
result.columns = ['name','total_year','trend']
result_dict = result.to_dict(orient='records')
Demo -
In [9]: df = pd.DataFrame(list_yearly).merge(pd.DataFrame(list_monthly))
In [10]: df
Out[10]:
name total_year month total_month
0 john 107 Jan 34
1 john 107 Feb 73
2 cathy 124 Jan 78
3 cathy 124 Feb 46
In [13]: def func(group):
....: result = []
....: for idx, row in group.iterrows():
....: result.append({'month':row['month'],'total_month':row['total_month']})
....: return result
....:
In [14]:
In [14]: result = df.groupby(['name','total_year']).apply(func).reset_index()
In [15]: result
Out[15]:
name total_year 0
0 cathy 124 [{'month': 'Jan', 'total_month': 78}, {'month'...
1 john 107 [{'month': 'Jan', 'total_month': 34}, {'month'...
In [19]: result.columns = ['name','total_year','trend']
In [20]: result
Out[20]:
name total_year trend
0 cathy 124 [{'month': 'Jan', 'total_month': 78}, {'month'...
1 john 107 [{'month': 'Jan', 'total_month': 34}, {'month'...
In [21]: result.to_dict(orient='records')
Out[21]:
[{'name': 'cathy',
'total_year': 124,
'trend': [{'month': 'Jan', 'total_month': 78},
{'month': 'Feb', 'total_month': 46}]},
{'name': 'john',
'total_year': 107,
'trend': [{'month': 'Jan', 'total_month': 34},
{'month': 'Feb', 'total_month': 73}]}]