In the given pandas dataframe:
df =
contig pos PI_index hapX_My_Sum hapY_My_Sum hapX_Sp_Sum
0 2 16229767 726 0.0 12.0 3.5
1 2 16229783 726 0.0 12.0 3.5
3 2 16229880 726 0.0 12.0 2.0
4 2 16230491 255 12.0 0.0 0.0
5 2 16230503 255 12.0 0.0 0.0
6 2 16232072 255 11.0 1.0 0.0
7 2 16232072 255 11.0 1.0 0.0
8 2 16232282 3353 11.0 1.0 0.0
9 2 16232444 3353 11.0 1.0 0.0
10 2 16232444 3353 11.0 1.0 0.0
I want to convert this dataframe to dictionary of dictionary
i.e default(dict)
So, I did:
from collections import defaultdict
df_dict = df.to_dict('index')
print(df_dict) # gives me
{0: {'hapY_My_Sum': 12.0, 'hapX_Sp_Sum': 3.5 .....}
All, is good but instead of using the main pandas index
I want to use the PI_index
as the indexes to generate defaultdict(<class 'dict'>
where PI_index
values are the keys
to do downstream analyses.
The print output of the defaultdict
should be like:
defaultdict(<class 'dict'>, {'726': {'contig': '2', 'hapX_My_Sum': ['0.0', '0.0', '0.0'], 'hapY_My_Sum': ['12.0', '12.0', '12.0'], ....}, '255':{'contig': '2', 'hapX_My_Sum': [....]....}})
Post edit:
So, downstream I can do something like:
for k in df_dict:
contig = df_dict[k]['chr']
hapX_My_product = reduce(mul, (float(x) for x in (df_dict[k]['hapX_My_Sum'])))
Is that what you want?
In [11]: cols = ['contig','PI_index','hapX_My_Sum']
In [12]: df[cols].groupby('PI_index') \
.apply(lambda x: x.set_index('PI_index').to_dict('list')) \
.to_dict()
Out[12]:
{255: {'contig': [2, 2, 2, 2], 'hapX_My_Sum': [12.0, 12.0, 11.0, 11.0]},
726: {'contig': [2, 2, 2], 'hapX_My_Sum': [0.0, 0.0, 0.0]},
3353: {'contig': [2, 2, 2], 'hapX_My_Sum': [11.0, 11.0, 11.0]}}
Some explanation:
first we generate dictionaries for each group
In [87]: df[cols].groupby('PI_index') \
...: .apply(lambda x: x.set_index('PI_index').to_dict('list'))
Out[87]:
PI_index
255 {'contig': [2, 2, 2, 2], 'hapX_My_Sum': [12.0,...
726 {'contig': [2, 2, 2], 'hapX_My_Sum': [0.0, 0.0...
3353 {'contig': [2, 2, 2], 'hapX_My_Sum': [11.0, 11...
dtype: object
now we can export rows as dictionary, setting corresponding index and using default orient='dict'
In [88]: df[cols].groupby('PI_index') \
...: .apply(lambda x: x.set_index('PI_index').to_dict('list')) \
...: .to_dict()
Out[88]:
{255: {'contig': [2, 2, 2, 2], 'hapX_My_Sum': [12.0, 12.0, 11.0, 11.0]},
726: {'contig': [2, 2, 2], 'hapX_My_Sum': [0.0, 0.0, 0.0]},
3353: {'contig': [2, 2, 2], 'hapX_My_Sum': [11.0, 11.0, 11.0]}}