In the df below, we have two "Mixes" as indicated by the Mix_Name and Mix_ID columns. And within each of these mixes are multiple tracks with unique Track_ID's that contain different territories (see Territories column).
Mix_Name Mix_ID Territories Track_Title Artist Track_ID
Mix Vol 1 abc123 GB | US | MX Cabala Prok & Fitch 12345678
Mix Vol 1 abc123 US | CA Autosave Fedde 9101112
Mix Vol 1 abc123 US | CA 4 Life Firebeatz 4151617
Mix Vol 1 abc123 IE | MX 4 Life Firebeatz 18192021
Mix Vol 2 xyz456 CA Warrior My Nu Leng 22232425
Mix Vol 2 xyz456 FR | DE | NE Warrior My Nu Leng 27282930
Mix Vol 2 xyz456 US | CA I'm Madonna Madonna 31323334
Mix Vol 2 xyz456 CA | US | FR Trouble Gregor Salto 35363738
What I'm hoping to do here find out which tracks do not have the US territory, and if any of the tracks do not have the US territory, I will want to drop the entire mix from my dataframe. With the resulting dataframe looking like this because "Mix Vol 2" is missing the US territory in one of its tracks:
Mix_Name Mix_ID Territories Track_Title Artist Track_ID
Mix Vol 1 abc123 GB | US | MX Cabala Prok & Fitch 12345678
Mix Vol 1 abc123 US | CA Autosave Fedde 9101112
Mix Vol 1 abc123 US | CA 4 Life Firebeatz 4151617
Mix Vol 1 abc123 IE | MX 4 Life Firebeatz 18192021
I know that I need to Groupby: 'Mix_ID', 'Track_Title', and 'Artist' but I'm unsure of how to search the territories column to see if it contains the "US" territory. Any help would be much appreciated!
df.groupby(['Mix_Name', 'Track', 'Artist']).filter(lambda x: (x['Territories'].str.contains('US').any()))
You do your groupby, then filter the groups to check if the territories column contains 'US'.
EDIT
notInUS = df.groupby(['Mix_Name','Track_Title','Artist']).filter(lambda x: ~x['Territories'].str.contains('US').any())['Mix_Name'].unique()
df[~df['Mix_Name'].isin(notInUS)]
We can look for the unique Mix_Name's that doesn't have a track in the US, then filter it out of our main df.