Search code examples
pythonpandasdataframemergemerging-data

Data Merging Pandas


I was running some pcmark tests in different machines. In the end i want to consolidate the machine results. I have modified the end result to display. I have tried different forms of merge using pandas but i could not get expected result but this was close enough. Any suggestions would be appreciated

Data frame from machine 1:

|------------|---------------------------|--------------|------------|  
|Test Case   | SubTest                   | App          | Count      |  
|------------|---------------------------|--------------|------------|   
|pcmark10    | AppStartUp                | NaN          | NaN        |  
|pcmark10    | PhotoEditing              | NaN          | NaN        |  
|pcmark10    | RenderingAndVisualization | NaN          | NaN        |  
|pcmark10    | Spreadsheet               | soffice.bin  | 1.0        |  
|pcmark10    | VideoConferencing         | NaN          | NaN        |  
|pcmark10    | VideoEditing              | NaN          | NaN        |  
|pcmark10    | WebBrowsing               | NaN          | NaN        |  
|pcmark10    | Writing                   | NaN          | NaN        |  
|------------|---------------------------|--------------|------------|  

Dataframe from machine 2:

|------------|---------------------------|--------------|------------|  
|Test Case   | SubTest                   | App          | Count      |  
|------------|---------------------------|--------------|------------|   
|pcmark10    | AppStartUp                | NaN          | NaN        |  
|pcmark10    | PhotoEditing              | NaN          | NaN        |  
|pcmark10    | RenderingAndVisualization | NaN          | NaN        |  
|pcmark10    | Spreadsheet               | NaN          | NaN        |  
|pcmark10    | VideoConferencing         | NaN          | NaN        |  
|pcmark10    | VideoEditing              | NaN          | NaN        |  
|pcmark10    | WebBrowsing               | chrome.exe   | 2          |  
|pcmark10    | Writing                   | NaN          | NaN        |  
|------------|---------------------------|--------------|------------|  

I want the result to look like this:

|------------|---------------------------|--------------|------------|------------|  
|Test Case   | SubTest                   | App          | Count_x    | Count_y    |
|------------|---------------------------|--------------|------------|------------|
|pcmark10    | AppStartUp                | NaN          | NaN        | NaN        |  
|pcmark10    | PhotoEditing              | NaN          | NaN        | NaN        |  
|pcmark10    | RenderingAndVisualization | NaN          | NaN        | NaN        |  
|pcmark10    | Spreadsheet               | soffice.bin  | 1.0        | NaN        |  
|pcmark10    | VideoConferencing         | NaN          | NaN        | NaN        |  
|pcmark10    | VideoEditing              | NaN          | NaN        | NaN        |    
|pcmark10    | WebBrowsing               | chrome.exe   | NaN        | 2          |  
|pcmark10    | Writing                   | NaN          | NaN        | NaN        |  
|------------|---------------------------|--------------|------------|------------|  

I tried the outer merge combining all the keys this is what i got. Using outer function lead the row value of pcmark10 with web browsing as blank. Chrome is missing from the app column.

|------------|---------------------------|--------------|------------|------------|  
|Test Case   | SubTest                   | App          | Count_x    | Count_y    |
|------------|---------------------------|--------------|------------|------------|
|pcmark10    | AppStartUp                | NaN          | NaN        | NaN        |  
|pcmark10    | PhotoEditing              | NaN          | NaN        | NaN        |  
|pcmark10    | RenderingAndVisualization | NaN          | NaN        | NaN        |  
|pcmark10    | Spreadsheet               | soffice.bin  | 1.0        | NaN        |  
|pcmark10    | VideoConferencing         | NaN          | NaN        | NaN        |  
|pcmark10    | VideoEditing              | NaN          | NaN        | NaN        |    
|pcmark10    | WebBrowsing               | NaN          | NaN        | 2          |  
|pcmark10    | Writing                   | NaN          | NaN        | NaN        |  
|------------|---------------------------|--------------|------------|------------|  

Merge Command:- pd.merge(df1, df2, on=['Test Case', 'SubTest', 'App'], how="outer", indicator=True)


Solution

  • In your case, merge on Test Case and SubTest, then use ffill or bfill to create App:

    (df1.merge(df2, on=['Test Case', 'SubTest'])
        .assign(App=lambda x: x.filter(like='App').bfill(1).iloc[:,0])
        .drop(['App_x','App_y'], axis=1)
    )
    

    Output:

      Test Case                    SubTest  Count_x  Count_y          App
    0  pcmark10                 AppStartUp      NaN      NaN          NaN
    1  pcmark10               PhotoEditing      NaN      NaN          NaN
    2  pcmark10  RenderingAndVisualization      NaN      NaN          NaN
    3  pcmark10                Spreadsheet      1.0      NaN  soffice.bin
    4  pcmark10          VideoConferencing      NaN      NaN          NaN
    5  pcmark10               VideoEditing      NaN      NaN          NaN
    6  pcmark10                WebBrowsing      NaN      2.0   chrome.exe
    7  pcmark10                    Writing      NaN      NaN          NaN