Search code examples
rfilterduplicatessubsetunique

R - Identify and remove ONE instance of duplicate rows


For context: this is a follow up to this query which I recently posted: R - Identify and remove duplicate rows based on two columns

I need to do something very similar to what I described in that post, but let me explain here in full.

I have some data that looks like this (in case it's relevant, there are MANY other columns with other data):

Course_ID   Text_ID
33          17
33          17
58          17
5           22
8           22
42          25
42          25
17          26
17          26
35          39
51          39

I need to identify any instances where there are two or more matching values for Course_ID AND Text_ID. For example, in the data above, the first two rows in both columns are identical (33 and 17). I need to remove just one of these duplicate lines wherever they occur.

The final data should look like this:

Course_ID   Text_ID
33          17
58          17
5           22
8           22
42          25
17          26
35          39
51          39

The solution offered in my previous post removed all instances of any duplicate rows.

Thanks in advance.


Solution

  • subset(df, !duplicated(df[c('Course_ID', 'Text_ID')]))
       Course_ID Text_ID
    1         33      17
    3         58      17
    4          5      22
    5          8      22
    6         42      25
    8         17      26
    10        35      39
    11        51      39
    

    or even

    df[!duplicated(df[c('Course_ID', 'Text_ID')]), ]
    

    If only 2 columns as shown, just do unique(df)