I would like help creating a function that dynamically iterates through a dataframe by group (non specified color), looks at ID to see if which id's do not line up with the majority of the id's that are in each grouping by color(so whatever number of observations there are per color, whatever is equivalent to more than half of the populate id's per color, for this case are correct). The real dataset will most likely have 10-50 rows per color and there could be multiple instances where there is an out of place id. It would be great if we could include the string note 'Flag for later research', or if easier a simple 0/1 output and i can write the corresponding text functionality. I am having trouble figuring out where to start. With either a groupby nunique function or a loop or something that combines the two.
Sample of data:
color id commitment Note *(where i need help)
blue 1 10
blue 1 5
blue 1 15
blue 2 10 Flag for later research
blue 1 9
green 3 10
green 3 11
green 2 12 Flag for later research
green 3 15
This code:
df['Note'] = ~df.duplicated(['color','id'], keep=False)
gives your:
color id commitment Note
0 blue 1 10 False
1 blue 1 5 False
2 blue 1 15 False
3 blue 2 10 True
4 blue 1 9 False
5 green 3 10 False
6 green 3 11 False
7 green 2 12 True
8 green 3 15 False