I have some data that looks like this:
Course_ID Text_ID
33 17
33 17
58 17
5 22
8 22
42 25
42 25
17 26
17 26
35 39
51 39
Not having a background in programming, I'm finding it tricky to articulate my question, but here goes: I only want to keep rows where Course_ID
varies but where Text_ID
is the same. So for example, the final data would look something like this:
Course_ID Text_ID
5 22
8 22
35 39
51 39
As you can see, Text_ID
22 and 39 are the only ones that have different Course_ID
values. I suspect subsetting the data would be the way to go, but as I said, I'm quite a novice at this kind of thing and would really appreciate any advice on how to approach this.
Select those groups where there is no repeats of Course_ID
.
In dplyr
you can write this as -
library(dplyr)
df %>% group_by(Text_ID) %>% filter(n_distinct(Course_ID) == n()) %>% ungroup
# Course_ID Text_ID
# <int> <int>
#1 5 22
#2 8 22
#3 35 39
#4 51 39
and in data.table
-
library(data.table)
setDT(df)[, .SD[uniqueN(Course_ID) == .N], Text_ID]