Given a pandas dataFrame, how does one convert several numeric columns (where x≠1 denotes the value exists, x=0 denotes it doesn't) into pairwise categorical dataframe? I know it is similar to one-hot decoding but the columns are not exactly one hot.
An example:
df
id A B C D
0 3 0 0 1
1 4 1 0 0
2 1 7 20 0
3 0 0 0 4
4 0 0 0 0
5 0 1 0 0
The result would be: df id match
result
0 A
0 D
1 A
1 B
2 A
2 B
2 C
3 D
5 B
Use DataFrame.stack
with filtering and Index.to_frame
:
s = df.stack()
df = s[s!=0].index.to_frame(index=False).rename(columns={1:'result'})
print (df)
id result
0 0 A
1 0 D
2 1 A
3 1 B
4 2 A
5 2 B
6 2 C
7 3 D
8 5 B
Or if performance is important use numpy.where
for indices by matched values with DataFrame
constructor:
i, c = np.where(df != 0)
df = pd.DataFrame({'id':df.index.values[i],
'result':df.columns.values[c]})
print (df)
id result
0 0 A
1 0 D
2 1 A
3 1 B
4 2 A
5 2 B
6 2 C
7 3 D
8 5 B
EDIT:
For first:
s = df.stack()
df = s[s!=0].reset_index()
df.columns= ['id','result','vals']
print (df)
id result vals
0 0 A 3
1 0 D 1
2 1 A 4
3 1 B 1
4 2 A 1
5 2 B 7
6 2 C 20
7 3 D 4
8 5 B 1
For second:
df = pd.DataFrame({'id':df.index.values[i],
'result':df.columns.values[c],
'vals':df.values[i,c]})