Search code examples
pythonpandaspandas-groupbymulti-index

How do I delete rows which have only one entry in multi-index dataframe?


I have the following type of muti-index dataframe:


import random
col3=[0,0,0,0,2,4,6,0,0,0,100,200,300,400]
col4=[0,0,0,0,4,6,8,0,0,0,200,900,400, 500]

d = {'Unit': [1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6], 
 'Year': [2014, 2015, 2016, 2017, 2015, 2016, 2017, 2017, 2014, 2015, 2014, 2015, 2016, 2017], 'col3' : col3, 'col4' : col4 }
df = pd.DataFrame(data=d)
new_df = df.groupby(['Unit', 'Year']).sum()

           col3  col4     
Unit Year                      
1    2014     0     0      
     2015     0     0       
     2016     0     0      
     2017     0     0      
2    2015     2     4       
     2016     4     6  
     2017     6     8  
3    2017     0     0    
4    2014     0     0      
5    2015     0     0      
6    2014   100   200       
     2015   200   900  
     2016   300   400  
     2017   400   500  

In reality it is larger ofcourse, but this does the job. In this dataframe I want to remove all Units, which have only one year entry. So I want to have this:

           col3  col4     
Unit Year                      
1    2014     0     0      
     2015     0     0       
     2016     0     0      
     2017     0     0      
2    2015     2     4       
     2016     4     6  
     2017     6     8         
6    2014   100   200       
     2015   200   900  
     2016   300   400  
     2017   400   500  

Thank you in advance for help,

Jen


Solution

  • Use GroupBy.transform with any column and test counts with GroupBy.size, compare for not equal by Series.ne and filter by boolean indexing:

    df = new_df[new_df.groupby(level=0)['col3'].transform('size').ne(1)]
    

    Or get values of index by Index.get_level_values and filter by Index.duplicated:

    df = new_df[new_df.index.get_level_values(0).duplicated(keep=False)]
    

    print (df)
               col3  col4
    Unit Year            
    1    2014     0     0
         2015     0     0
         2016     0     0
         2017     0     0
    2    2015     2     4
         2016     4     6
         2017     6     8
    6    2014   100   200
         2015   200   900
         2016   300   400
         2017   400   500