Search code examples
rdataframedplyrfilterduplicates

How to duplicate and replace certain values in duplicated records multiple times using R and dplyr?


I have records in a dataframe that I want to duplicate multiple times, and apply some changes to each iteration of duplicate rows.

Using the following dataframe:

name <- c("Kiwi","Orange","Apple")
val1 <- c(1,9,10)
val2 <- c(1,1,1)

df <- data.frame(name, val1, val2)

I can duplicate the row where name == "Apple" and replace the name in the duplicate row:

df.2 <- df %>% 
            filter(name == "Apple") %>% 
            mutate(name = "Strawberry") %>%
            bind_rows(., df)

However running this operation more than once does not work as expected:

df.2 <- df %>% 
            filter(name == "Apple") %>% 
            mutate(name = "Strawberry") %>%
            bind_rows(., df) %>%
            filter(name == "Apple") %>% 
            mutate(name = "Dragonfruit") %>%
            bind_rows(., df)

Expected result:

val1 <- c(10,10,1,9,10)
val2 <- c(1,1,1,1,1)
name <- c("Dragonfruit","Strawberry","Kiwi","Orange","Apple")

expected.result <- data.frame(name, val1, val2)
            

Solution

  • Here is another way of doing it:

    df.2 <- bind_rows(df, 
                      df %>% filter(name == "Apple") %>% mutate(name = "Strawberry"),
                      df %>% filter(name == "Apple") %>% mutate(name = "Dragonfruit"))
    

    Of course, it's more concise to make a new variable for df %>% filter(name == "Apple"), and then use that, but not massively more, so I didn't, but you can.

    To find the issue with your code, let's go through it line by line:

    df.2 <- df %>% 
                filter(name == "Apple") %>% 
                mutate(name = "Strawberry") %>%
    

    at this point, the dataframe is one row Strawberry 10 1

                bind_rows(., df) %>%
    

    this value is added to the beginning of the old dataframe

                filter(name == "Apple") %>%
    

    ...but then you promptly remove it again, by filtering only for rows where the name is "Apple"

                mutate(name = "Dragonfruit") %>%
                bind_rows(., df)
    

    hence, at the end, you don't have the strawberry row anymore.