Search code examples
rsortingdplyr

R - create a column indicating whether another column has the same values


I have a dataset looking like this:

df
# A tibble: 21 × 2
   animals_id clus_ID
   <chr>        <int>
 1 L085           246
 2 L085           246
 3 L085           246
 4 L084           247
 5 L084           247
 6 L084           247
 7 L085           249
 8 L084           249
 9 L084           249
10 L087           249

And I want to create another column, "type", telling me whether animals_id within clus_ID differs or not (that is, is it only one animal or more). It should look like this:

   animals_id clus_ID   type
 1 L085           246   A
 2 L085           246   A
 3 L085           246   A
 4 L084           247   A
 5 L084           247   A
 6 L084           247   A
 7 L085           249   B
 8 L084           249   B
 9 L084           249   B
10 L087           249   B

Following this question, I created this code:

 df %>% group_by(clus_ID) %>% mutate(test = ifelse(length(unique(df[,"animals_id"]))==1, "A", "B"))

AND

 df %>% group_by(clus_ID) %>% mutate(type = ifelse(n_distinct(animals_id) == 1, "A", "B"))

But none of these work, it's either all "A" or "B"... Any thoughts?

Dataset for reproduction:

> dput(df)
structure(list(animals_id = c("L085", "L085", "L085", "L084", 
"L084", "L084", "L085", "L084", "L084", "L087", "L084", "L084", 
"L084", "L084", "L084", "L084", "L084", "L084", "L084", "L084", 
"L084"), clus_ID = c(246L, 246L, 246L, 247L, 247L, 247L, 249L, 
249L, 249L, 249L, 249L, 249L, 249L, 249L, 249L, 249L, 249L, 249L, 
249L, 249L, 249L)), class = "data.frame", row.names = c(366428L, 
366429L, 366430L, 349169L, 349170L, 349171L, 366435L, 349185L, 
349186L, 378191L, 349343L, 349345L, 349346L, 349347L, 349477L, 
349478L, 349479L, 349480L, 349706L, 349869L, 350121L))

Solution

  • You could articulate an A cluster as one for which the minimum and maximum cluster_id are the same value:

    df %>%
    group_by(clus_ID) %>%
    dplyr::mutate(test = ifelse(min(animals_id) == max(animals_id), "A", "B"))