I have two data set. Data set 1 and Data set 2 which is as follow:
Dataset1:-
family_id house_id number_family_member
1 1052 2
2 5042 3
3 1111 2
Dataset2:-
family_id house_id age gender
1 1052 24 male
1 1052 25 female
2 5042 23 male
2 5042 20 female
3 1111 1 male
3 1111 20 female
3 1111 21 female
Here is the mismatch between the number of member entered in dataset1 and details of individual entered in dataset2. Like For family id 2, the number of member in family is 3 in dataset1 but the in dataset2 there is entry of only 2 member. How to identify these types of mismatch between two data sets????
both of these views might be helpful for you :
dataset2 %>%
add_count(family_id) %>%
inner_join(dataset1) %>%
mutate(match= n ==number_family_member)
# # A tibble: 7 x 7
# family_id house_id age gender n number_family_member match
# <int> <int> <int> <fctr> <int> <int> <lgl>
# 1 1 1052 24 male 2 2 TRUE
# 2 1 1052 25 female 2 2 TRUE
# 3 2 5042 23 male 2 3 FALSE
# 4 2 5042 20 female 2 3 FALSE
# 5 3 1111 1 male 3 2 FALSE
# 6 3 1111 20 female 3 2 FALSE
# 7 3 1111 21 female 3 2 FALSE
dataset2 %>%
count(family_id) %>%
inner_join(dataset1) %>%
mutate(match= n ==number_family_member)
# # A tibble: 3 x 5
# family_id n house_id number_family_member match
# <int> <int> <int> <int> <lgl>
# 1 1 2 1052 2 TRUE
# 2 2 2 5042 3 FALSE
# 3 3 3 1111 2 FALSE