I have this dataframe:
Company | Version | Disp Version | complement | Value |
---|---|---|---|---|
1 | 1 | 0 | 1 | 100 |
1 | 1 | 0 | 2 | 200 |
1 | 2 | 1 | 1 | 300 |
1 | 2 | 1 | 2 | 400 |
2 | 1 | 1 | 1 | 500 |
2 | 1 | 1 | 2 | 600 |
2 | 2 | 1 | 1 | 700 |
2 | 2 | 1 | 2 | 800 |
3 | 1 | 1 | 1 | 900 |
3 | 1 | 1 | 2 | 1000 |
4 | 1 | 0 | 1 | 1100 |
4 | 1 | 0 | 2 | 1200 |
4 | 2 | 0 | 1 | 1300 |
4 | 2 | 0 | 2 | 1400 |
4 | 3 | 0 | 1 | 1500 |
4 | 3 | 0 | 2 | 1600 |
5 | 1 | 0 | 1 | 1700 |
5 | 1 | 0 | 2 | 1800 |
5 | 2 | 0 | 1 | 1900 |
5 | 2 | 0 | 2 | 2000 |
5 | 3 | 0 | 1 | 2100 |
5 | 3 | 0 | 2 | 2200 |
5 | 4 | 1 | 1 | 2300 |
5 | 4 | 1 | 2 | 2400 |
6 | 1 | 0 | 1 | 2500 |
6 | 1 | 0 | 2 | 2600 |
6 | 2 | 0 | 1 | 2700 |
6 | 2 | 0 | 2 | 2800 |
7 | 1 | 1 | 1 | 400 |
7 | 1 | 1 | 2 | 400 |
I want my dataframe to be filtered with some conditions:
Furthermore, you need to have for each 'Company' value, the values 1 and 2 for the column 'complement'.
Examples:
For the first condition I need a dataframe like this:
Company | Version | Disp Version | complement | Value |
---|---|---|---|---|
1 | 2 | 1 | 1 | 300 |
1 | 2 | 1 | 2 | 400 |
5 | 4 | 1 | 1 | 2300 |
5 | 4 | 1 | 2 | 2400 |
For the second condition I need a dataframe like this:
Company | Version | Disp Version | complement | Value |
---|---|---|---|---|
2 | 2 | 1 | 1 | 700 |
2 | 2 | 1 | 2 | 800 |
3 | 1 | 1 | 1 | 900 |
3 | 1 | 1 | 2 | 1000 |
7 | 1 | 1 | 1 | 400 |
7 | 1 | 1 | 2 | 400 |
For the third condition I need a dataframe like this:
Company | Version | Disp Version | complement | Value |
---|---|---|---|---|
4 | 3 | 0 | 1 | 1500 |
4 | 3 | 0 | 2 | 1600 |
6 | 2 | 0 | 1 | 2700 |
6 | 2 | 0 | 2 | 2800 |
I need this output (with is the 3 dataframes together):
Company | Version | Disp Version | complement | Value |
---|---|---|---|---|
1 | 2 | 1 | 1 | 300 |
1 | 2 | 1 | 2 | 400 |
2 | 2 | 1 | 1 | 700 |
2 | 2 | 1 | 2 | 800 |
3 | 1 | 1 | 1 | 900 |
3 | 1 | 1 | 2 | 1000 |
4 | 3 | 0 | 1 | 1500 |
4 | 3 | 0 | 2 | 1600 |
5 | 4 | 1 | 1 | 2300 |
5 | 4 | 1 | 2 | 2400 |
6 | 2 | 0 | 1 | 2700 |
6 | 2 | 0 | 2 | 2800 |
7 | 1 | 1 | 1 | 400 |
7 | 1 | 1 | 2 | 400 |
1. Get maximum version number for each company
maxv = df.groupby('Company')['Version'].max()
2. Join with the original dataframe
merged_df = pd.merge(df, maxv, on=['Company'])
3. Get companies which satisfy the conditions (have both 1 and 0)
idx = df.groupby('Company').apply(lambda g: g['Disp Version'].sum() < g['Disp Version'].count() and g['Disp Version'].sum() > 0)
valids = idx.loc[idx ==True]
4. Final result
df.loc[df['Company'].isin(valids.index) & df['Disp Version'] == 1]