Search code examples
raggregatedplyrindicatordummy-variable

Subsetting by multiple aggregate conditions in dplyr


I was hoping someone knew of an easy/efficient in dplyr in which I can define an indicator variable to take the value of 1 if on Date X, an IP address was present >50 times. The data is two columns, one of IP addresses and the other associated access dates.

As an example, I would like the following output in the Robot column (assuming that the Date/IP combination was >=3).

IP Date Robot
1   A   1
1   A   1
1   A   1
1   B   0
2   B   0
2   C   1
2   C   1
2   C   1
3   C   0
3   D   0
4   A   0

Thanks!


Solution

  • You can group_by the two variables and use n() to test how many adresses where present that day.

    group_by(df,date,ip) %>% 
      mutate(keep=as.numeric(n() > 50))