Search code examples
pythonpandasloops

How to Loop through each element of a loop and filter out conditions in a python dataframe


I have a list of subcategories and a dataframe. I want to filter out the dataframe on the basis of each subcategory of the list.

lst = [7774, 29409, 36611, 77553]

import pandas as pd
data = {'aucctlg_id': [143424, 143424, 143424, 143388, 143388, 143430],
        'catalogversion_id': [1, 1, 1, 1, 1, 1.2],
'Key': [1434241, 1434241, 1434241, 1433881, 1433881, 14343012],
        'item_id': [4501118, 4501130, 4501129, 4501128, 4501127, 4501126],
'catlog_description': ['M&BP PIG IRON FA', 'M&BP PIG IRON FA', 'M&BP PIG IRON FA', 'PIG IRON MIXED OG','PIG IRON MIXED OG', 'P.S JAM & PIG IRON FINES'],
        'catlog_start_date': ['17-05-2024  11:00:00', '17-05-2024  11:00:00', '17-05-2024  11:00:00', '17-05-2024  11:00:00','17-05-2024  11:00:00', '17-05-2024  11:00:00'],
'subcategoryid': [29409, 29409, 29409, 7774, 7774, 36611],
        'quantity': [200, 200, 200, 180, 180, 100],
        'auctionable': ['Y', 'Y', 'Y', 'Y' ,'Y' ,'Y']
}

df = pd.DataFrame(data)
print(df)

I have tried using the following code but I want output as a dataframe here it generates a list and of a single subcategory:

new=[]
for i in range(0, len(lst)):
    mask1 = df['subcategoryid']==(lst[i])
    df2 = df.loc[mask1]
    new.append(df2)

Required Output files, with the filtered data:

df_7774, df_29409, df_36611

Solution

  • You could pre-filter with isin, then use groupby:

    lst = [7774, 29409, 36611, 77553]
    
    out = dict(list(df[df['subcategoryid'].isin(lst)].groupby('subcategoryid')))
    

    Which will create a dictionary of the desired dataframes:

    {7774:    aucctlg_id  catalogversion_id      Key  item_id catlog_description     catlog_start_date  subcategoryid  quantity auctionable
           3      143388                1.0  1433881  4501128  PIG IRON MIXED OG  17-05-2024  11:00:00           7774       180           Y
           4      143388                1.0  1433881  4501127  PIG IRON MIXED OG  17-05-2024  11:00:00           7774       180           Y,
     29409:    aucctlg_id  catalogversion_id      Key  item_id catlog_description     catlog_start_date  subcategoryid  quantity auctionable
            0      143424                1.0  1434241  4501118   M&BP PIG IRON FA  17-05-2024  11:00:00          29409       200           Y
            1      143424                1.0  1434241  4501130   M&BP PIG IRON FA  17-05-2024  11:00:00          29409       200           Y
            2      143424                1.0  1434241  4501129   M&BP PIG IRON FA  17-05-2024  11:00:00          29409       200           Y,
     36611:    aucctlg_id  catalogversion_id       Key  item_id        catlog_description     catlog_start_date  subcategoryid  quantity auctionable
            5      143430                1.2  14343012  4501126  P.S JAM & PIG IRON FINES  17-05-2024  11:00:00          36611       100           Y}
    

    If you want to create files without intermediate:

    lst = [7774, 29409, 36611, 77553]
    
    for k, g in df[df['subcategoryid'].isin(lst)].groupby('subcategoryid'):
        g.to_excel(f'df_{k}.xlsx')