Search code examples
pythonpandasab-testing

How to see if one value have 2 matches in 1 column in pandas


I have results from A/B test that I need to evaluate but in the checking of the data I noticed that there were users that were in both control groups and I need to drop them to not hurt the test. My data looks something like this:

    transactionId   visitorId   date       revenue  group
0   906125958          0        2019-08-16  10.8     B
1   1832336629         1        2019-08-04  25.9     B
2   3698129301         2        2019-08-01  165.7    B
3   4214855558         2        2019-08-07  30.5     A
4   797272108          3        2019-08-23  100.4    A

What I need to do is remove every user that was in both A and B groups while leaving the rest intact. So from the example data I need this output:

    transactionId   visitorId   date       revenue  group
0   906125958          0        2019-08-16  10.8     B
1   1832336629         1        2019-08-04  25.9     B
4   797272108          3        2019-08-23  100.4    A

I tried to do it in various ways and I can't seems to figure it out and I couldn't find an answer for it anywhere I would really appreciate some help here, thanks in advance


Solution

  • You can get a list of users that are in just one group like this:

    group_counts = df.groupby('visitorId').agg({'group': 'nunique'}) ##list of users with number of groups
    to_include = group_counts[group_counts['group'] == 1] ##filter for just users in 1 group
    

    And then filter your original data according to which visitors are in that list:

    df = df[df['visitorId'].isin(to_include.index)]