Search code examples
rduplicatesextract

R - Extracting duplicates to a dataframe


I need help with R, similar to question filtering-a-dataframe-showing-only-duplicates I wish to extract duplicates from a dataframe with over 2,000 entries.

The first 15 rows of data looks like this:

run id Diff
1 20 0
1 4 1024
1 4 1
1 4 1
1 4 65
1 4 1
1 4 1
1 11 475
1 11 1
1 11 1
2 25 0
2 18 0
2 18 1
2 18 1
2 18 1

I wish to extract only the duplicates, i.e.

run id Diff
1 4 1024
1 4 1
1 4 1
1 4 65
1 4 1
1 4 1
1 11 475
1 11 1
1 11 1
2 18 0
2 18 1
2 18 1
2 18 1

Using the command

mydata_extract %>% group_by(id) %>% filter(n() > 1) does not extract the data, in fact I get the complete set of data returned. Is there something about "filter(n() > 1)" that I need to change? I'm a beginner with R. Sorry my data table is not formatting correctly, it looks okay in preview!

I will also want to group my data first by "run"


Solution

  • Maybe add run and id in the group_by()?

      library(dplyr)
       df <- tibble::tribble(
          ~"run", ~"id", ~"Diff",
          1, 20, 0,
          1, 4, 1024,
          1, 4, 1,
          1, 4, 1,
          1, 4, 65,
          1, 4, 1,
          1, 4, 1,
          1, 11, 4,
          1, 11, 1,
          1, 11, 1,
          2, 25, 0,
          2, 18, 0,
          2, 18, 1,
          2, 18, 1,
          2, 18, 1
        ) %>% 
         group_by(run, id) %>% 
          filter(n()>1)
    
    
    
       # A tibble: 13 x 3
    # Groups:   run, id [3]
         run    id  Diff
       <dbl> <dbl> <dbl>
     1     1     4  1024
     2     1     4     1
     3     1     4     1
     4     1     4    65
     5     1     4     1
     6     1     4     1
     7     1    11     4
     8     1    11     1
     9     1    11     1
    10     2    18     0
    11     2    18     1
    12     2    18     1
    13     2    18     1
    

    You can add a mutate, to see how this n() works (counts the number of rows per group),e.g.

    df %>% 
     group_by(run, id) %>% 
      mutate(n = n())