This is a sample of my dataset
> head(dataset, 20)
nquest nord tpens
1 173 1 1800
2 633 1 300
3 633 1 600
4 923 1 500
5 2886 1 1211
6 2886 2 2100
7 5416 1 700
8 7886 1 1800
9 7886 1 200
10 20297 1 1200
11 20711 2 2000
12 22169 1 600
13 22169 1 280
14 22173 2 1000
15 22276 1 1200
16 22286 1 850
17 22286 2 650
18 22657 1 1400
19 22657 2 1500
20 23490 1 1400
The variables are:
nquest
= is the code of the family to which the individual belongnord
= is the position of the individual in the family ( 1=husband, 2=wife, 3= son, etc..)tpens
= is the wage that each one of them earnI need to find out if for a specific individual there are more than one value for the variable tpens
. To identify a single person it is fundamental to take into account both nquest
and nord
because they have to be the same on different rows. To be more clear
How can I compute how many observations referred to the same individual I have?
I've tried
dim(dataset[duplicated(dataset$nquest & dataset$nord),])[1]
sum(duplicated(dataset$nquest & dataset$nord))
But I'm pretty sure it is the wrong code, because they sum all the nquest
that are the equal and do the same for nord
. Actually, I need the sum when BOTH have the same vale at the same time
subset(df, !(duplicated(df[1:2]) | duplicated(df[1:2], fromLast = TRUE)))
# A tibble: 14 × 3
nquest nord tpens
<dbl> <dbl> <dbl>
1 173 1 1800
2 923 1 500
3 2886 1 1211
4 2886 2 2100
5 5416 1 700
6 20297 1 1200
7 20711 2 2000
8 22173 2 1000
9 22276 1 1200
10 22286 1 850
11 22286 2 650
12 22657 1 1400
13 22657 2 1500
14 23490 1 1400