Search code examples
pythonpandasfiltermultiple-conditions

How can I filter multiple dataframes with multiple conditions (pandas)?


I have this dataframe:

Company Version Disp Version complement Value
1 1 0 1 100
1 1 0 2 200
1 2 1 1 300
1 2 1 2 400
2 1 1 1 500
2 1 1 2 600
2 2 1 1 700
2 2 1 2 800
3 1 1 1 900
3 1 1 2 1000
4 1 0 1 1100
4 1 0 2 1200
4 2 0 1 1300
4 2 0 2 1400
4 3 0 1 1500
4 3 0 2 1600
5 1 0 1 1700
5 1 0 2 1800
5 2 0 1 1900
5 2 0 2 2000
5 3 0 1 2100
5 3 0 2 2200
5 4 1 1 2300
5 4 1 2 2400
6 1 0 1 2500
6 1 0 2 2600
6 2 0 1 2700
6 2 0 2 2800
7 1 1 1 400
7 1 1 2 400

I want my dataframe to be filtered with some conditions:

  1. If the column 'Company' value has 'Disp Version' equals to 1 or 0, you need to get the rows that have 'Disp Version' equals to 1 and the max of the column 'Version';
  2. If the column 'Company' value has 'Disp Version' equals only to 1, you need to get max of the column 'Version';
  3. If the column 'Company' value has 'Disp Version' equals only to 0, you need to get max of the column 'Version'.

Furthermore, you need to have for each 'Company' value, the values 1 and 2 for the column 'complement'.

Examples:

For the first condition I need a dataframe like this:

Company Version Disp Version complement Value
1 2 1 1 300
1 2 1 2 400
5 4 1 1 2300
5 4 1 2 2400

For the second condition I need a dataframe like this:

Company Version Disp Version complement Value
2 2 1 1 700
2 2 1 2 800
3 1 1 1 900
3 1 1 2 1000
7 1 1 1 400
7 1 1 2 400

For the third condition I need a dataframe like this:

Company Version Disp Version complement Value
4 3 0 1 1500
4 3 0 2 1600
6 2 0 1 2700
6 2 0 2 2800

I need this output (with is the 3 dataframes together):

Company Version Disp Version complement Value
1 2 1 1 300
1 2 1 2 400
2 2 1 1 700
2 2 1 2 800
3 1 1 1 900
3 1 1 2 1000
4 3 0 1 1500
4 3 0 2 1600
5 4 1 1 2300
5 4 1 2 2400
6 2 0 1 2700
6 2 0 2 2800
7 1 1 1 400
7 1 1 2 400

Solution

  •     1. Get maximum version number for each company
        maxv = df.groupby('Company')['Version'].max()
    
        2. Join with the original dataframe
        merged_df = pd.merge(df, maxv, on=['Company'])
    
        3. Get companies which satisfy the conditions (have both 1 and 0)
        idx = df.groupby('Company').apply(lambda g: g['Disp Version'].sum() < g['Disp Version'].count() and g['Disp Version'].sum() > 0)
        valids = idx.loc[idx ==True]
    
        4. Final result
        df.loc[df['Company'].isin(valids.index) & df['Disp Version'] == 1]