I am working in R
sample data:
I have people which have an id which has been assigned from their social security/national insurance number (actual_id). actual_id is assumed to be correct.
This column may not be needed, but I have worked out how many times each person appears in the data (number). e.g. we can see the person with actual_id = 1 appears twice in the data, so number = 2 for this person.
I have created a new method of id-ing people, which has nothing to do with social security/national insurance number. The new method has generated an id which is in the proposed_id column.
df <- data.frame(actual_id = c(1, 1, 2, 2, 2, 3, 3),
proposed_id = c("a", "a", "b", "b", "c", "d", "b"),
number = c(2, 2, 3, 3, 3, 2, 2))
print(df)
actual_id | number | proposed_id |
---|---|---|
1 | 2 | a |
1 | 2 | a |
2 | 3 | b |
2 | 3 | b |
2 | 3 | c |
3 | 2 | d |
3 | 2 | b |
Desired outcome
I want a way of assessing if the proposed_id is used only once for any actual_id:
My actual data has 90,000 rows.
actual_id | number | proposed_id | assessment |
---|---|---|---|
1 | 2 | a | TRUE |
1 | 2 | a | TRUE |
2 | 3 | b | FALSE |
2 | 3 | b | FALSE |
2 | 3 | c | TRUE |
3 | 2 | d | TRUE |
3 | 2 | b | FALSE |
with(df, ave(actual_id, proposed_id, FUN = function(z) length(unique(z)) == 1)) > 0
# [1] TRUE TRUE FALSE FALSE TRUE TRUE FALSE
library(dplyr)
df %>%
mutate(assessment = n_distinct(actual_id) == 1, .by = proposed_id)
# actual_id proposed_id number assessment
# 1 1 a 2 TRUE
# 2 1 a 2 TRUE
# 3 2 b 3 FALSE
# 4 2 b 3 FALSE
# 5 2 c 3 TRUE
# 6 3 d 2 TRUE
# 7 3 b 2 FALSE
library(data.table)
as.data.table(df)[, assessment := uniqueN(actual_id) == 1, by = "proposed_id"][]
# actual_id proposed_id number assessment
# <num> <char> <num> <lgcl>
# 1: 1 a 2 TRUE
# 2: 1 a 2 TRUE
# 3: 2 b 3 FALSE
# 4: 2 b 3 FALSE
# 5: 2 c 3 TRUE
# 6: 3 d 2 TRUE
# 7: 3 b 2 FALSE
Sample data
df <- structure(list(actual_id = c(1, 1, 2, 2, 2, 3, 3), proposed_id = c("a", "a", "b", "b", "c", "d", "b"), number = c(2, 2, 3, 3, 3, 2, 2)), class = "data.frame", row.names = c(NA, -7L))