Search code examples
pythonregexif-statementlambdapandas-apply

Taking values with at least one specified values from another column


I have this dataset

custID     documentno    quantity    producttype
1          1281261658       3        accesories
1          1281261658       3        major
1          1281261658       3        accesories
2          0105162445       2        major      
2          0105162445       2        major
3          0100848728       2        major
3          0100848728       2        accesories
4          0106075074       1        major

So, I want to filter the dataset in a way I want documentno's with atleast one 'accesories' in it. So I want to remove documents with only 'major' in it.

custID = {1,1,1,2,2,3,3,4}
documentno = {1281261658,1281261658,1281261658,0105162445,0105162445,0100848728,0100848728,0106075074}
quantity = {3,3,3,2,2,2,2,1}
producttype = {'accesories','major','accesories','major','major','major','accesories','major'}

the final result is:

id     documentno   quantity    producttype
1      1281261658      3        accesories
1      1281261658      3        major
1      1281261658      3        accesories
3      0100848728      2        major
3      0100848728      2        accesories

I tried with if else, but I couldn't do it.

Thanks in advance


Solution

  • Given pd.DataFrame df:

    df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,4],
                       'documentno' : ['1281261658','1281261658','1281261658','0105162445','0105162445','0100848728','0100848728','0106075074'],
                       'quantity' : [3,3,3,2,2,2,2,1],
                       'producttype' : ['accesories','major','accesories','major','major','major','accesories','major']})
    

    Try the following:

    df.loc[df.documentno.isin(df.documentno[df.producttype.eq('accesories')].unique()),:]

    Result is:

       id  documentno  quantity producttype
    0   1  1281261658         3  accesories
    1   1  1281261658         3       major
    2   1  1281261658         3  accesories
    5   3  0100848728         2       major
    6   3  0100848728         2  accesories
    

    As a filter df.documentno[df.producttype.eq('accesories')] is used which returns documentno values that have one or more accesories. I assume that quantity in dataset df is always greatest than 0.

    However if quantity may be zero:

    df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,4],
                       'documentno' : ['1281261658','1281261658','1281261658','0105162445','0105162445','0100848728','0100848728','0106075074'],
                       'quantity' : [3,3,3,2,2,2,0,1],
                       'producttype' : ['accesories','major','accesories','major','major','major','accesories','major']})
    

    just add condition df.quantity.gt(0):

    df.loc[ df.documentno.isin(df.documentno[(df.producttype.eq('accesories') & df.quantity.gt(0))].unique()) ,:]