I was running some pcmark tests in different machines. In the end i want to consolidate the machine results. I have modified the end result to display. I have tried different forms of merge using pandas but i could not get expected result but this was close enough. Any suggestions would be appreciated
Data frame from machine 1:
|------------|---------------------------|--------------|------------|
|Test Case | SubTest | App | Count |
|------------|---------------------------|--------------|------------|
|pcmark10 | AppStartUp | NaN | NaN |
|pcmark10 | PhotoEditing | NaN | NaN |
|pcmark10 | RenderingAndVisualization | NaN | NaN |
|pcmark10 | Spreadsheet | soffice.bin | 1.0 |
|pcmark10 | VideoConferencing | NaN | NaN |
|pcmark10 | VideoEditing | NaN | NaN |
|pcmark10 | WebBrowsing | NaN | NaN |
|pcmark10 | Writing | NaN | NaN |
|------------|---------------------------|--------------|------------|
Dataframe from machine 2:
|------------|---------------------------|--------------|------------|
|Test Case | SubTest | App | Count |
|------------|---------------------------|--------------|------------|
|pcmark10 | AppStartUp | NaN | NaN |
|pcmark10 | PhotoEditing | NaN | NaN |
|pcmark10 | RenderingAndVisualization | NaN | NaN |
|pcmark10 | Spreadsheet | NaN | NaN |
|pcmark10 | VideoConferencing | NaN | NaN |
|pcmark10 | VideoEditing | NaN | NaN |
|pcmark10 | WebBrowsing | chrome.exe | 2 |
|pcmark10 | Writing | NaN | NaN |
|------------|---------------------------|--------------|------------|
I want the result to look like this:
|------------|---------------------------|--------------|------------|------------|
|Test Case | SubTest | App | Count_x | Count_y |
|------------|---------------------------|--------------|------------|------------|
|pcmark10 | AppStartUp | NaN | NaN | NaN |
|pcmark10 | PhotoEditing | NaN | NaN | NaN |
|pcmark10 | RenderingAndVisualization | NaN | NaN | NaN |
|pcmark10 | Spreadsheet | soffice.bin | 1.0 | NaN |
|pcmark10 | VideoConferencing | NaN | NaN | NaN |
|pcmark10 | VideoEditing | NaN | NaN | NaN |
|pcmark10 | WebBrowsing | chrome.exe | NaN | 2 |
|pcmark10 | Writing | NaN | NaN | NaN |
|------------|---------------------------|--------------|------------|------------|
I tried the outer merge combining all the keys this is what i got. Using outer function lead the row value of pcmark10 with web browsing as blank. Chrome is missing from the app column.
|------------|---------------------------|--------------|------------|------------|
|Test Case | SubTest | App | Count_x | Count_y |
|------------|---------------------------|--------------|------------|------------|
|pcmark10 | AppStartUp | NaN | NaN | NaN |
|pcmark10 | PhotoEditing | NaN | NaN | NaN |
|pcmark10 | RenderingAndVisualization | NaN | NaN | NaN |
|pcmark10 | Spreadsheet | soffice.bin | 1.0 | NaN |
|pcmark10 | VideoConferencing | NaN | NaN | NaN |
|pcmark10 | VideoEditing | NaN | NaN | NaN |
|pcmark10 | WebBrowsing | NaN | NaN | 2 |
|pcmark10 | Writing | NaN | NaN | NaN |
|------------|---------------------------|--------------|------------|------------|
Merge Command:- pd.merge(df1, df2, on=['Test Case', 'SubTest', 'App'], how="outer", indicator=True)
In your case, merge on Test Case
and SubTest
, then use ffill
or bfill
to create App
:
(df1.merge(df2, on=['Test Case', 'SubTest'])
.assign(App=lambda x: x.filter(like='App').bfill(1).iloc[:,0])
.drop(['App_x','App_y'], axis=1)
)
Output:
Test Case SubTest Count_x Count_y App
0 pcmark10 AppStartUp NaN NaN NaN
1 pcmark10 PhotoEditing NaN NaN NaN
2 pcmark10 RenderingAndVisualization NaN NaN NaN
3 pcmark10 Spreadsheet 1.0 NaN soffice.bin
4 pcmark10 VideoConferencing NaN NaN NaN
5 pcmark10 VideoEditing NaN NaN NaN
6 pcmark10 WebBrowsing NaN 2.0 chrome.exe
7 pcmark10 Writing NaN NaN NaN