I have a dataset looking like this:
df
# A tibble: 21 × 2
animals_id clus_ID
<chr> <int>
1 L085 246
2 L085 246
3 L085 246
4 L084 247
5 L084 247
6 L084 247
7 L085 249
8 L084 249
9 L084 249
10 L087 249
And I want to create another column, "type
", telling me whether animals_id
within clus_ID
differs or not (that is, is it only one animal or more). It should look like this:
animals_id clus_ID type
1 L085 246 A
2 L085 246 A
3 L085 246 A
4 L084 247 A
5 L084 247 A
6 L084 247 A
7 L085 249 B
8 L084 249 B
9 L084 249 B
10 L087 249 B
Following this question, I created this code:
df %>% group_by(clus_ID) %>% mutate(test = ifelse(length(unique(df[,"animals_id"]))==1, "A", "B"))
AND
df %>% group_by(clus_ID) %>% mutate(type = ifelse(n_distinct(animals_id) == 1, "A", "B"))
But none of these work, it's either all "A" or "B"... Any thoughts?
Dataset for reproduction:
> dput(df)
structure(list(animals_id = c("L085", "L085", "L085", "L084",
"L084", "L084", "L085", "L084", "L084", "L087", "L084", "L084",
"L084", "L084", "L084", "L084", "L084", "L084", "L084", "L084",
"L084"), clus_ID = c(246L, 246L, 246L, 247L, 247L, 247L, 249L,
249L, 249L, 249L, 249L, 249L, 249L, 249L, 249L, 249L, 249L, 249L,
249L, 249L, 249L)), class = "data.frame", row.names = c(366428L,
366429L, 366430L, 349169L, 349170L, 349171L, 366435L, 349185L,
349186L, 378191L, 349343L, 349345L, 349346L, 349347L, 349477L,
349478L, 349479L, 349480L, 349706L, 349869L, 350121L))
You could articulate an A cluster as one for which the minimum and maximum cluster_id
are the same value:
df %>%
group_by(clus_ID) %>%
dplyr::mutate(test = ifelse(min(animals_id) == max(animals_id), "A", "B"))