For context: this is a follow up to this query which I recently posted: R - Identify and remove duplicate rows based on two columns
I need to do something very similar to what I described in that post, but let me explain here in full.
I have some data that looks like this (in case it's relevant, there are MANY other columns with other data):
Course_ID Text_ID
33 17
33 17
58 17
5 22
8 22
42 25
42 25
17 26
17 26
35 39
51 39
I need to identify any instances where there are two or more matching values for Course_ID
AND Text_ID
. For example, in the data above, the first two rows in both columns are identical (33 and 17). I need to remove just one of these duplicate lines wherever they occur.
The final data should look like this:
Course_ID Text_ID
33 17
58 17
5 22
8 22
42 25
17 26
35 39
51 39
The solution offered in my previous post removed all instances of any duplicate rows.
Thanks in advance.
subset(df, !duplicated(df[c('Course_ID', 'Text_ID')]))
Course_ID Text_ID
1 33 17
3 58 17
4 5 22
5 8 22
6 42 25
8 17 26
10 35 39
11 51 39
or even
df[!duplicated(df[c('Course_ID', 'Text_ID')]), ]
If only 2 columns as shown, just do unique(df)