python pandas group-by conditional-statements

Pandas Groupby Show Non-Matching Values

I have the following dataframe:

data = [['123456ABCD234567', 'A'], ['8502', 'A'], 
    ['74523654894WRZI3', 'B'], ['85CGNK6987541236', 'B'], 
    ['WF85Z4HJ95R4CF2V', 'C'], ['VB52FG85RT74DF96', 'C'],
    ['WERTZ852146', 'D'], ['APUNGF', 'D']
   ]
df = pd.DataFrame(data, columns=['CODE', 'STOCK'])
df

    CODE                STOCK
0   123456ABCD234567    A
1   8502                A
2   74523654894WRZI3    B
3   85CGNK6987541236    B
4   WF85Z4HJ95R4CF2V    C
5   VB52FG85RT74DF96    C
6   WERTZ852146         D
7   APUNGF              D

Each stock is part of various codes. The code should have a length of 16 characters. My objective is to filter out any stocks that have no codes attached which are not composed of 16 characters. In this example, stock A has at least one code with a length of 16 characters, so it should be kept. However, stock D has no codes with a length of 16 characters.

I believe this can be accomplished using the groupby-function in Pandas.

Ultimately, I aim at obtaining below output:

    CODE          STOCK
6   WERTZ852146   D
7   APUNGF        D

Many thanks for any suggestions in advance!

Solution

You could create a boolean column for values not matching the condition and groupby.transform with all to identify the STOCK with all non matching rows:

out = df[df['CODE'].str.len().ne(16).groupby(df['STOCK']).transform('all')]

Output:

          CODE STOCK
6  WERTZ852146     D
7       APUNGF     D

Intermediates:

               CODE STOCK  str.len  ne(16)  transform('all')
0  123456ABCD234567     A       16   False             False
1              8502     A        4    True             False
2  74523654894WRZI3     B       16   False             False
3  85CGNK6987541236     B       16   False             False
4  WF85Z4HJ95R4CF2V     C       16   False             False
5  VB52FG85RT74DF96     C       16   False             False
6       WERTZ852146     D       11    True              True
7            APUNGF     D        6    True              True

Using DeMorgan's law you could also run:

out = df[~df['CODE'].str.len().eq(16).groupby(df['STOCK']).transform('any')]

Intermediates:

               CODE STOCK  str.len  eq(16)  transform('any')      ~
0  123456ABCD234567     A       16    True              True  False
1              8502     A        4   False              True  False
2  74523654894WRZI3     B       16    True              True  False
3  85CGNK6987541236     B       16    True              True  False
4  WF85Z4HJ95R4CF2V     C       16    True              True  False
5  VB52FG85RT74DF96     C       16    True              True  False
6       WERTZ852146     D       11   False             False   True
7            APUNGF     D        6   False             False   True

And without groupby you could identify all the STOCK that have at least one match, and reverse select the others with isin:

out = df[~df['STOCK'].isin(df.loc[df['CODE'].str.len().eq(16), 'STOCK'].unique())]

Intermediates:

df.loc[df['CODE'].str.len().eq(16), 'STOCK'].unique()
# array(['A', 'B', 'C'], dtype=object)

~df['STOCK'].isin(df.loc[df['CODE'].str.len().eq(16), 'STOCK'].unique())
# 0    False
# 1    False
# 2    False
# 3    False
# 4    False
# 5    False
# 6     True
# 7     True
# Name: STOCK, dtype: bool