I have results from A/B test that I need to evaluate but in the checking of the data I noticed that there were users that were in both control groups and I need to drop them to not hurt the test. My data looks something like this:
transactionId visitorId date revenue group
0 906125958 0 2019-08-16 10.8 B
1 1832336629 1 2019-08-04 25.9 B
2 3698129301 2 2019-08-01 165.7 B
3 4214855558 2 2019-08-07 30.5 A
4 797272108 3 2019-08-23 100.4 A
What I need to do is remove every user that was in both A and B groups while leaving the rest intact. So from the example data I need this output:
transactionId visitorId date revenue group
0 906125958 0 2019-08-16 10.8 B
1 1832336629 1 2019-08-04 25.9 B
4 797272108 3 2019-08-23 100.4 A
I tried to do it in various ways and I can't seems to figure it out and I couldn't find an answer for it anywhere I would really appreciate some help here, thanks in advance
You can get a list of users that are in just one group like this:
group_counts = df.groupby('visitorId').agg({'group': 'nunique'}) ##list of users with number of groups
to_include = group_counts[group_counts['group'] == 1] ##filter for just users in 1 group
And then filter your original data according to which visitors are in that list:
df = df[df['visitorId'].isin(to_include.index)]