I hope I am explaining this correctly. I have a dataframe in which i need to identify pairs of rows based on the string value of two columns. Each row in the pair must have a different string value in another column. I then need to add a new value to a new column based on TRUE or FALSE condition of that third column AND the condition of the pairing.
For example. A simplified version of the df would be:
The end result would look like this:
Any help would be greatly appreciated.
Hoping this helps, using data similar to the examples you shared.
data = {
'name': ['John', 'John', 'Jane', 'Jane', 'Doe', 'Doe'],
'city': ['LA', 'LA', 'SF', 'SF', 'SD', 'SD'],
'item': ['Peanut Butter', 'Jelly', 'Peanut Butter', 'Peanut Butter',
'Jelly', 'Jelly']
}
df = pd.DataFrame(data)
We can then create a dictionary of the string pairs from name and city with the values being a list of the items associated with the pairs.
unique_dict = df.groupby(['name', 'city'])['item'].apply(list).to_dict()
Once we have this we define a function to handle the logic of what each pair needs and can apply it to the dataframe.
def determine_needs(items):
if 'Peanut Butter' in items and 'Jelly' in items:
return None
elif 'Peanut Butter' in items:
return 'Jelly'
elif 'Jelly' in items:
return 'Peanut Butter'
else:
return None
df['needs'] = df.apply(lambda row: determine_needs(unique_dict[(row['name'], row['city'])]), axis=1)