Search code examples

R remove duplicates completely across different groups

I have a dataset like the following:

enter image description here

R code to replicate the dataset:

mydata <- data.frame(Name =c('Alex','Brenda','Carl','Alex','Daniel',
 Product_Name = c('A','A','A','A','A','A','A','B','B','B','B','B'), 
Use = c(0,0,0,1,1,1,1,0,0,1,1,1))

This is a dataset from a survey of product usage. Name contains name of the user, Product_Name is the name of the product (Product A or Product B. In the real dataset, there are more than 2) and Use contains information whether the user uses the product (1 = yes, 0 = no).

Unfortunately some individuals selected both yes and no to questions regarding whether they use a product or not. I want to remove these individuals but only for the Product_Name in question. In the example, user Alex replied yes and no for product A:

enter image description here

I want to remove such individuals but I want to remove them only for the product concerned. Here I only want to remove Alex for Product A and leave Alex for Product B. This should be how I want the dataset to look like:

enter image description here

I know that I can remove duplicates using the unique package in R ( but that would still leave one case of Alex in Product 1. I would also like to limit the search for unique names within each Product_Name (ie. only Product A or Product B and so on). Any help will be appreciated.

Please let me know if the question is not very clear. Thanks in advance.


Now suppose we have the following scenario:

mydata <- data.frame(Name =c('Alex','Brenda','Carl','Alex','Daniel',
 Product_Name = c('A','A','A','A','A','A','A','B','B','B','B',
Use = c(0,0,0,1,1,1,1,0,0,1,1,1,0,0,1,1))

In addition to the above condition where if a person has use =0 and use = 1 then they are deleted I have an additional condition. If Use = 0 and we see multiple entries for same user, then we do not delete the observations. However,if Use = 1 and we see multiple instances of the same user then we delete them. For instance, in the figure below, I would like to keep the observations for Mary and drop the observations for Richard.

enter image description here

The final output that I would like to get would look something like this:

enter image description here

In this figure, note that I do not want to delete Mary since for both instances Use =0. However since Use = 1 for Richard, I would like to delete his observations.

enter image description here


  • Original Question

    mydata %>%
            group_by(Product_Name, Name) %>%
            filter(length(Use) == 1)

    Follow-up Question

    mydata %>%
            group_by(Product_Name, Name) %>%
            filter(length(Use) == 1 | (Use == 0 & n_distinct(Use) == 1))