Search code examples
rdplyrduplicatessumdataset

how to find duplicates but considering two columns at the same time?


This is a sample of my dataset

> head(dataset, 20)
   nquest nord tpens
1     173    1  1800
2     633    1   300
3     633    1   600
4     923    1   500
5    2886    1  1211
6    2886    2  2100
7    5416    1   700
8    7886    1  1800
9    7886    1   200
10  20297    1  1200
11  20711    2  2000
12  22169    1   600
13  22169    1   280
14  22173    2  1000
15  22276    1  1200
16  22286    1   850
17  22286    2   650
18  22657    1  1400
19  22657    2  1500
20  23490    1  1400

The variables are:

  1. nquest = is the code of the family to which the individual belong
  2. nord = is the position of the individual in the family ( 1=husband, 2=wife, 3= son, etc..)
  3. tpens = is the wage that each one of them earn

I need to find out if for a specific individual there are more than one value for the variable tpens. To identify a single person it is fundamental to take into account both nquest and nord because they have to be the same on different rows. To be more clear

Dataset

How can I compute how many observations referred to the same individual I have?

I've tried

dim(dataset[duplicated(dataset$nquest & dataset$nord),])[1]

sum(duplicated(dataset$nquest & dataset$nord))

But I'm pretty sure it is the wrong code, because they sum all the nquest that are the equal and do the same for nord. Actually, I need the sum when BOTH have the same vale at the same time


Solution

  • subset(df, !(duplicated(df[1:2]) | duplicated(df[1:2], fromLast = TRUE)))
    
    # A tibble: 14 × 3
       nquest  nord tpens
        <dbl> <dbl> <dbl>
     1    173     1  1800
     2    923     1   500
     3   2886     1  1211
     4   2886     2  2100
     5   5416     1   700
     6  20297     1  1200
     7  20711     2  2000
     8  22173     2  1000
     9  22276     1  1200
    10  22286     1   850
    11  22286     2   650
    12  22657     1  1400
    13  22657     2  1500
    14  23490     1  1400