Checking if a value appears again by group

I am working in R

sample data:

I have people which have an id which has been assigned from their social security/national insurance number (actual_id). actual_id is assumed to be correct.

This column may not be needed, but I have worked out how many times each person appears in the data (number). e.g. we can see the person with actual_id = 1 appears twice in the data, so number = 2 for this person.

I have created a new method of id-ing people, which has nothing to do with social security/national insurance number. The new method has generated an id which is in the proposed_id column.

df <- data.frame(actual_id = c(1, 1, 2, 2, 2, 3, 3), 
                 proposed_id = c("a", "a", "b", "b", "c", "d", "b"), 
                 number = c(2, 2, 3, 3, 3, 2, 2))

print(df)

actual_id	number	proposed_id
1	2	a
1	2	a
2	3	b
2	3	b
2	3	c
3	2	d
3	2	b

Desired outcome

I want a way of assessing if the proposed_id is used only once for any actual_id:

We can see that proposed_id = a has been allocated to actual_id = 1, and does not appear again in proposed_id column. This would therefore be allocated "TRUE".
We can see that proposed_id = b has been allocated to actual_id = 2, but it has then been allocated further down to proposed_id = 3. This would be allocated as "FALSE".

My actual data has 90,000 rows.

actual_id	number	proposed_id	assessment
1	2	a	TRUE
1	2	a	TRUE
2	3	b	FALSE
2	3	b	FALSE
2	3	c	TRUE
3	2	d	TRUE
3	2	b	FALSE

Solution

base R

with(df, ave(actual_id, proposed_id, FUN = function(z) length(unique(z)) == 1)) > 0
# [1]  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE

dplyr

library(dplyr)
df %>%
  mutate(assessment = n_distinct(actual_id) == 1, .by = proposed_id)
#   actual_id proposed_id number assessment
# 1         1           a      2       TRUE
# 2         1           a      2       TRUE
# 3         2           b      3      FALSE
# 4         2           b      3      FALSE
# 5         2           c      3       TRUE
# 6         3           d      2       TRUE
# 7         3           b      2      FALSE

data.table

library(data.table)
as.data.table(df)[, assessment := uniqueN(actual_id) == 1, by = "proposed_id"][]
#    actual_id proposed_id number assessment
#        <num>      <char>  <num>     <lgcl>
# 1:         1           a      2       TRUE
# 2:         1           a      2       TRUE
# 3:         2           b      3      FALSE
# 4:         2           b      3      FALSE
# 5:         2           c      3       TRUE
# 6:         3           d      2       TRUE
# 7:         3           b      2      FALSE

Sample data

df <- structure(list(actual_id = c(1, 1, 2, 2, 2, 3, 3), proposed_id = c("a", "a", "b", "b", "c", "d", "b"), number = c(2, 2, 3, 3, 3, 2, 2)), class = "data.frame", row.names = c(NA, -7L))