My Googlefu has failed me!
I have a Pandas DataFrame
of the form:
Level 1 Level 2 Level 3 Level 4
-------------------------------------
A B C NaN
A B D E
A B D F
G H NaN NaN
G I J K
It basically contains nodes of a graph with the levels depicting an outgoing edge from a level of lower order to a level of a higher order. I want to convert the DataFrame/create a new DataFrame of the form:
A B C D E F G H I J K
---------------------------------------------
A | 0 1 0 0 0 0 0 0 0 0 0
B | 0 0 1 1 0 0 0 0 0 0 0
C | 0 0 0 0 0 0 0 0 0 0 0
D | 0 0 0 0 1 1 0 0 0 0 0
E | 0 0 0 0 0 0 0 0 0 0 0
F | 0 0 0 0 0 0 0 0 0 0 0
G | 0 0 0 0 0 0 0 1 1 0 0
H | 0 0 0 0 0 0 0 0 0 0 0
I | 0 0 0 0 0 0 0 0 0 1 0
J | 0 0 0 0 0 0 0 0 0 0 1
K | 0 0 0 0 0 0 0 0 0 0 0
A cell containing 1
depicts an outgoing edge from the corresponding row to the corresponding column. Is there a Pythonic way to achieve this without loops and conditions in Pandas?
Try this code:
df = pd.DataFrame({'level_1':['A', 'A', 'A', 'G', 'G'], 'level_2':['B', 'B', 'B', 'H', 'I'],
'level_3':['C', 'D', 'D', np.nan, 'J'], 'level_4':[np.nan, 'E', 'F', np.nan, 'K']})
Your input dataframe is:
level_1 level_2 level_3 level_4
0 A B C NaN
1 A B D E
2 A B D F
3 G H NaN NaN
4 G I J K
And the solution is:
# Get unique values from input dataframe and filter out 'nan' values
list_nodes = []
for i_col in df.columns.tolist():
list_nodes.extend(filter(lambda v: v==v, df[i_col].unique().tolist()))
# Initialize your result dataframe
df_res = pd.DataFrame(columns=sorted(list_nodes), index=sorted(list_nodes))
df_res = df_res.fillna(0)
# Get 'index-column' pairs from input dataframe ('nan's are exluded)
list_indexes = []
for i_col in range(df.shape[1]-1):
list_indexes.extend(list(set([tuple(i) for i in df.iloc[:, i_col:i_col+2]\
.dropna(axis=0).values.tolist()])))
# Use 'index-column' pairs to fill the result dataframe
for i_list_indexes in list_indexes:
df_res.set_value(i_list_indexes[0], i_list_indexes[1], 1)
And the final result is:
A B C D E F G H I J K
A 0 1 0 0 0 0 0 0 0 0 0
B 0 0 1 1 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 1 1 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 1 1 0 0
H 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 0 0 0 0 0 1 0
J 0 0 0 0 0 0 0 0 0 0 1
K 0 0 0 0 0 0 0 0 0 0 0