Search code examples
rdataframeduplicatesmutate

How to add column with altered ID-names for duplicates within same day


I have a data frame with Days, ID, Date collected and a count value (number of hatched eggs) for several samples each day. The ID stems from the replicate (mother) from which the sample (a number of eggs) was taken, so it requires the information from the "Date collected" column in order to distinguish them as separate samples in for instance a plot.

I want to add a new column called sampleID in which I give each unique sample its own ID.

Example data:

d1 <- as.Date('2021-06-07')
d2 <- as.Date('2021-06-08')
d3 <- as.Date('2021-06-09')
df <- data.frame(Days = c(1,1,2,2,2,2,3,3,3,3,3),
ID = c(2,5,2,2,5,9,2,2,5,5,9),
Collected =c(d1,d1,d2,d1,d1,d2,d1,d2,d1,d3,d2))

I would like an output to look like:

Days ID Collected SampleID Count
1 2 2021-06-07 2-1 3
1 5 2021-06-07 5-1 5
2 2 2021-06-08 2-1 4
2 2 2021-06-07 2-2 1
2 5 2021-06-07 5-1 7
2 9 2021-06-08 9-1 2
3 2 2021-06-07 2-1 8
3 2 2021-06-08 2-2 5
3 5 2021-06-07 5-1 7
3 5 2021-06-09 5-2 2
3 9 2021-06-08 9-1 2

and I have been trying something like:

df <- df %>% 
group_by(Days) %>% 
mutate(ReplicateID = case_when(ID == ID & Collected != Collected ~ paste(as.character(ID)+"-1")))

Which doesn't work, but even if it did it would not be able to add -2 or -3 to ID's repeated more than once within the same day.. So I am kind of lost and would appreciate some help!


Solution

  • Maybe something like this?

    library(dplyr)
    d1 <- as.Date('2021-06-07')
    d2 <- as.Date('2021-06-08')
    d3 <- as.Date('2021-06-09')
    df <- data.frame(Days = c(1,1,2,2,2,2,3,3,3,3,3),
                     ID = c(2,5,2,2,5,9,2,2,5,5,9),
                     Collected =c(d1,d1,d2,d1,d1,d2,d1,d2,d1,d3,d2))
    
    df |>
      arrange(Days,ID,Collected) |>
      group_by(Days,ID) |>
      mutate(SampleID = paste(ID,row_number(),sep = '-'))