Search code examples
rdataframedplyrgroupingstring-concatenation

Changing the contents of a column according to the values of other columns using dplyr


I have the following dataframe with many different values in the page and passage column

df <- read.table(text="page passage  person index text
1  123   A   1 hello      
1  123   A   2 my
1  123   A   3 name
1  123   A   4 is
1  123   A   5 guy
1  124   B   1 well
1  124   B   2 hello
1  124   B   3 guy",header=T,stringsAsFactors=F)

I want to concatenate the content of the text column according to these columns so that it looks like this

1  123   A   1 hello my name is guy    
1  123   A   2 hello my name is guy
1  123   A   3 hello my name is guy
1  123   A   4 hello my name is guy
1  123   A   5 hello my name is guy
1  124   B   1 well hello guy
1  124   B   2 well hello guy
1  124   B   3 well hello guy

Solution

  • Use paste with collapse inside a grouping function:

    base R

    df$text <- ave(df$text, df$person, FUN = function(x) paste(x, collapse = " "))
    

    dplyr

    library(dplyr)
    df %>% 
      group_by(person) %>% 
      mutate(text = paste(text, collapse=" "))
    

    data.table

    setDT(df)[, text := paste(text, collapse = " "), person]
    

    output

       page passage person index text                
      <int>   <int> <chr>  <int> <chr>               
    1     1     123 A          1 hello my name is guy
    2     1     123 A          2 hello my name is guy
    3     1     123 A          3 hello my name is guy
    4     1     123 A          4 hello my name is guy
    5     1     123 A          5 hello my name is guy
    6     1     124 B          1 well hello guy      
    7     1     124 B          2 well hello guy      
    8     1     124 B          3 well hello guy