I have a list of subcategories and a dataframe. I want to filter out the dataframe on the basis of each subcategory of the list.
lst = [7774, 29409, 36611, 77553]
import pandas as pd
data = {'aucctlg_id': [143424, 143424, 143424, 143388, 143388, 143430],
'catalogversion_id': [1, 1, 1, 1, 1, 1.2],
'Key': [1434241, 1434241, 1434241, 1433881, 1433881, 14343012],
'item_id': [4501118, 4501130, 4501129, 4501128, 4501127, 4501126],
'catlog_description': ['M&BP PIG IRON FA', 'M&BP PIG IRON FA', 'M&BP PIG IRON FA', 'PIG IRON MIXED OG','PIG IRON MIXED OG', 'P.S JAM & PIG IRON FINES'],
'catlog_start_date': ['17-05-2024 11:00:00', '17-05-2024 11:00:00', '17-05-2024 11:00:00', '17-05-2024 11:00:00','17-05-2024 11:00:00', '17-05-2024 11:00:00'],
'subcategoryid': [29409, 29409, 29409, 7774, 7774, 36611],
'quantity': [200, 200, 200, 180, 180, 100],
'auctionable': ['Y', 'Y', 'Y', 'Y' ,'Y' ,'Y']
}
df = pd.DataFrame(data)
print(df)
I have tried using the following code but I want output as a dataframe here it generates a list and of a single subcategory:
new=[]
for i in range(0, len(lst)):
mask1 = df['subcategoryid']==(lst[i])
df2 = df.loc[mask1]
new.append(df2)
Required Output files, with the filtered data:
df_7774, df_29409, df_36611
You could pre-filter with isin
, then use groupby
:
lst = [7774, 29409, 36611, 77553]
out = dict(list(df[df['subcategoryid'].isin(lst)].groupby('subcategoryid')))
Which will create a dictionary of the desired dataframes:
{7774: aucctlg_id catalogversion_id Key item_id catlog_description catlog_start_date subcategoryid quantity auctionable
3 143388 1.0 1433881 4501128 PIG IRON MIXED OG 17-05-2024 11:00:00 7774 180 Y
4 143388 1.0 1433881 4501127 PIG IRON MIXED OG 17-05-2024 11:00:00 7774 180 Y,
29409: aucctlg_id catalogversion_id Key item_id catlog_description catlog_start_date subcategoryid quantity auctionable
0 143424 1.0 1434241 4501118 M&BP PIG IRON FA 17-05-2024 11:00:00 29409 200 Y
1 143424 1.0 1434241 4501130 M&BP PIG IRON FA 17-05-2024 11:00:00 29409 200 Y
2 143424 1.0 1434241 4501129 M&BP PIG IRON FA 17-05-2024 11:00:00 29409 200 Y,
36611: aucctlg_id catalogversion_id Key item_id catlog_description catlog_start_date subcategoryid quantity auctionable
5 143430 1.2 14343012 4501126 P.S JAM & PIG IRON FINES 17-05-2024 11:00:00 36611 100 Y}
If you want to create files without intermediate:
lst = [7774, 29409, 36611, 77553]
for k, g in df[df['subcategoryid'].isin(lst)].groupby('subcategoryid'):
g.to_excel(f'df_{k}.xlsx')