Search code examples
pythonpandasconditional-statementsextractmultiple-columns

pandas select multiple columns conditionally


Suppose I have a dataframe:

C1 V1 C2 V2 Cond
1  2  3  4  X  
5  6  7  8  Y  
9  10 11 12 X

The statements should return: if Cond == X, pick C1 and v1, else pick C2 and V2.

The output dataframe is something like:

C  V 
1  2 
7  8
9  10

** EDIT: To add one more requirement: the number of columns can change but follow some naming pattern. In this case select all columns with "1" in it, else with "2". I think the hard-coded solution might not work.


Solution

  • I try create more general solution with filter and numpy.where, for new column names use extract:

    #if necessary sort columns
    df = df.sort_index(axis=1)
    
    #filter df by 1 and 2
    df1 = df.filter(like='1')
    df2 = df.filter(like='2')
    print (df1)
       C1  V1
    0   1   2
    1   5   6
    2   9  10
    
    print (df2)
       C2  V2
    0   3   4
    1   7   8
    2  11  12
    
    #np.where need same shape of mask as df1 and df2
    mask = pd.concat([df.Cond == 'X']*len(df1.columns), axis=1)
    print (mask)
        Cond   Cond
    0   True   True
    1  False  False
    2   True   True
    
    cols = df1.columns.str.extract('([A-Za-z])', expand=False)
    print (cols)
    Index(['C', 'V'], dtype='object')
    
    print (np.where(mask, df1,df2))
    Index(['C', 'V'], dtype='object')
    [[ 1  2]
     [ 7  8]
     [ 9 10]]
    
    print (pd.DataFrame(np.where(mask, df1, df2), index=df.index, columns=cols))
       C   V
    0  1   2
    1  7   8
    2  9  10