Search code examples
pythonpandasnumpysubset

Create new dataframe by splitting on year pandas


I wish to create new dataframes, with the logic in which they are split is based on year.

Data

df

id  type    Q1 22   Q2 22   Q3 22   Q1 23   Q2 23   Q3 23    
aa  hi      0.2     0.8     0.3     1.1     2.1     0.4  
aa  ok      0.2     0.7     0.3     0.5     1.0     1.7  
aa  hello   2.0     0.1     0.0     0.1     0.1     0.1  

                             
                             

Desired

df1

id  type    Q1 22   Q2 22   Q3 22                
aa  hi      0.2     0.8     0.3              
aa  ok      0.2     0.7     0.3              
aa  hello   2.0     0.1     0.0 

         
                             
                             

df2

id  type    Q1 23   Q2 23   Q3 23               
aa  hi      1.1     2.1     0.4             
aa  ok      0.5     1.0     1.7             
aa  hello   0.1     0.1     0.1 



    

Doing

# sort the dataframe
df.sort_values(by='year', axis=1, inplace=True)

# set the index to be this and don't drop
df.set_index(keys=['year], drop=False,inplace=True)

# get a list of names
new=df['year'].unique().tolist()

#perform a lookup on a 'view' of the dataframe
new2023 = df.loc[df.name=='2023']

I am still researching, any suggestion is helpful.


Solution

  • here is one way to do it

    # filter column where column name don't ends with 3, gives us year 22
    df_new22=df.filter(regex='[^3]$')
    df_new22
    
    
        id  type    Q1 22   Q2 22   Q3 22
    0   aa  hi        0.2     0.8     0.3
    1   aa  ok        0.2     0.7     0.3
    2   aa  hello     2.0     0.1     0.0
    
    # filter columns that don't ends in 2, gives us year 23
    df_new23=df.filter(regex='[^2]$')
    df_new23
    
        id  type    Q1 23   Q2 23   Q3 23
    0   aa  hi        1.1     2.1     0.4
    1   aa  ok        0.5     1.0     1.7
    2   aa  hello     0.1     0.1     0.1