Search code examples
pythonpandassklearn-pandas

How do you match an element in the entire dataframe and return the entire row or index of that particular matched element?


Each row in this dataframe represents an order and executionStatus.x has some info about the order status.

Those executionStatus.x columns are automatically created by flatten_json by amirziai, depending on how many arguments there are. So if there are 3 statuses for one order like in row 0, there will be up to executionStatus.2. Since row 1 and 2 only have one status, it only has values in executionStatus.0.

My problem is I cannot match "ORDER_FULFILLED" because I don't know how many executionStatuses there will be and I would need to write the exact column name like so df[df['executionStatus.0'].str.match('ORDER_FULFILLED')].

         executionStatus.0 executionStatus.1 executionStatus.2  \
0  REQUESTED_AMOUNT_ROUNDED              MEOW   ORDER_FULFILLED   
1           ORDER_FULFILLED               NaN               NaN   
2     NOT_AN_INFUNDING_LOAN               NaN               NaN   

   investedAmount  loanId  requestedAmount  OrderInstructId  
0            50.0   22222             55.0            55555  
1            25.0   33333             25.0            55555  
2             0.0   44444             25.0            55555  

Is there a way to get the entire row or index that matched with "ORDER_FULFILLED" element in the entire dataframe?

Ideally, the matched dataframe should look like this because row 0 and row 1 have ORDER_FULFILLED in the executionStatuses and row 3 does not so it should be excluded. Thanks!

investedAmount  loanId  requestedAmount  OrderInstructId  
0            50.0   22222             55.0            55555  
1            25.0   33333             25.0            55555  

Solution

  • Use df.filter() for getting the similar columns containing executionStatus with a boolean mask:

    df[df.filter(like='executionStatus').eq('ORDER_FULFILLED').any(axis=1)]
    
              executionStatus.0 executionStatus.1 executionStatus.2  \
    0  REQUESTED_AMOUNT_ROUNDED              MEOW   ORDER_FULFILLED   
    1           ORDER_FULFILLED               NaN               NaN   
    
       investedAmount  loanId  requestedAmount  OrderInstructId  
    0              50   22222               55            55555  
    1              25   33333               25            55555  
    

    If you want to delete he execution columns from output, use:

    df.loc[df.filter(like='executionStatus').eq('ORDER_FULFILLED').any(axis=1),\
               df.columns.difference(df.filter(like='executionStatus').columns)