I was hoping someone knew of an easy/efficient in dplyr in which I can define an indicator variable to take the value of 1 if on Date X, an IP address was present >50 times. The data is two columns, one of IP addresses and the other associated access dates.
As an example, I would like the following output in the Robot column (assuming that the Date/IP combination was >=3).
IP Date Robot
1 A 1
1 A 1
1 A 1
1 B 0
2 B 0
2 C 1
2 C 1
2 C 1
3 C 0
3 D 0
4 A 0
Thanks!
You can group_by
the two variables and use n()
to test how many adresses where present that day.
group_by(df,date,ip) %>%
mutate(keep=as.numeric(n() > 50))