I have a data frame with genomic bins in the following format. Each genomic range is represented as a row and the cell value corresponds to that start of the bin.
0 1 2 3 4 5 ... 522
0 9248 9249 NaN NaN NaN NaN ... NaN
1 17291 17292 17293 17294 17295 NaN ... NaN
2 18404 18405 18406 18407 NaN NaN ... NaN
[69 rows x 522 columns]
As you can see, many of the row values are incomplete because some genomic ranges are smaller than others.
I wish to make pairwise combination for each index across the entire row. It would be fine if each pairwise interaction was stored as a separate data frame (preferable, even).
I want something like this:
0 - 1 Pairwise:
0 1
9248 17291
9248 17292
9248 17293
9248 17294
9248 17295
9249 17291
9249 17292
9249 17293
9249 17294
9249 17295
[10 rows x 2 columns]
0 - 2 Pairwise:
0 2
9248 18404
9248 18405
9248 18406
9248 18407
9249 18404
9249 18405
9249 18406
9249 18407
[8 rows x 2 columns]
I need every value combination for each pairwise row combination. I think I need to use itertools.product() to do this sort of thing but cannot figure out how to write the appropriate loop. Any help is greatly appreciated!
Setup
from pandas.tools.util import cartesian_product as cp
df = pd.DataFrame({'0': {0: 9248, 1: 17291, 2: 18404},
'1': {0: 9249, 1: 17292, 2: 18405},
'2': {0: np.nan, 1: 17293.0, 2: 18406.0},
'3': {0: np.nan, 1: 17294.0, 2: 18407.0},
'4': {0: np.nan, 1: 17295.0, 2: np.nan},
'5': {0: np.nan, 1: np.nan, 2: np.nan},
'522': {0: np.nan, 1: np.nan, 2: np.nan}})
Solution
final={}
# use cartesian_product to get all the combinations for each row with other rows and add the results to the final dictionary.
df.apply(lambda x: [final.update({(x.name, i): np.r_[cp([x.dropna(), df.iloc[i].dropna()])].T}) for i in range(x.name+1,len(df))], axis=1)
Verification
for k, v in final.items():
print(k)
print(v)
(0, 1)
[[ 9248. 17291.]
[ 9248. 17292.]
[ 9248. 17293.]
...,
[ 9249. 17293.]
[ 9249. 17294.]
[ 9249. 17295.]]
(1, 2)
[[ 17291. 18404.]
[ 17291. 18405.]
[ 17291. 18406.]
...,
[ 17295. 18405.]
[ 17295. 18406.]
[ 17295. 18407.]]
(0, 2)
[[ 9248. 18404.]
[ 9248. 18405.]
[ 9248. 18406.]
...,
[ 9249. 18405.]
[ 9249. 18406.]
[ 9249. 18407.]]