I have a "empty" data frame looking as follows:
6807 6809 5341
126293 nan nan nan
126294 nan nan nan
126295 nan nan nan
The column names give me an name_id
whereas the index values give me a file_id
. Now I want to search for the file_id
and the name_id
in separate pandas data frames named pro
, cont
, and neutral
which look like this:
file_id name_id
0 126293 7244
1 126293 4978
2 126293 5112
3 126293 6864
If I find the file_id
and name_id
in the pro
dataframe I want to fill the empty data frame cell above with 1
, when found in cont
then -1
when in neutral
, then the value entered into the matrix should be 0
. Giving me a result like this, e.g.:
6807 6809 5341
126293 1 -1 0
126294 0 -1 0
126295 1 -1 1
Does someone know how to get this done?
You can stack your 'empty' df (let's call it df
) and merge against a combination of pro
, con
and neu
. Then you can re-arrange it back into a 2d shape
Put the votes together into one dataframe:
votes = pd.concat([pro.assign(v=1), con.assign(v=-1), neu.assign(v=0)])
votes['name_id'] = votes['name_id'].astype(str) # you may or may not have to do this depending on what type your actual df is, as I have no way of knowing. It should match the type from columns in the empty df
votes
now look like this (made up numbers by me):
file_id name_id v
0 126293 6807 1
1 126293 4978 1
2 126293 5112 1
3 126293 6864 1
0 126295 6809 -1
0 126294 5341 0
Now we merge it to a stacked df
on name_id and file_id:
df1 = (df.unstack()
.reset_index()
.merge(votes, left_on = ['level_0','level_1'],
right_on = [ 'name_id','file_id'], how='left')[['level_0', 'level_1', 'v']]
)
df1
looks like
level_0 level_1 v
0 6807 126293 1.0
1 6807 126294 NaN
2 6807 126295 NaN
3 6809 126293 NaN
4 6809 126294 NaN
5 6809 126295 -1.0
6 5341 126293 NaN
7 5341 126294 0.0
8 5341 126295 NaN
Now unstack
it back
df1.set_index(['level_1','level_0']).unstack()
output:
v
level_0 5341 6807 6809
level_1
126293 NaN 1.0 NaN
126294 0.0 NaN NaN
126295 NaN NaN -1.0
You get NaNs where you had no votes in either pro con or neu. The votes in those dfs that are for file_id/name_id not originally present in df
are ignored