Search code examples
rstringstring-concatenation

Conditionally concatenate string over multiple rows


I have extracted multiple tables from a PDF which contains strings over multiple lines. I have used the extract_table() function from the tabulizer package, the only problem being that the strings import as separate rows.

e.g.

action <- c(1, NA, NA, 2, NA, 3, NA, NA, NA, 4, NA)

description <- c("a", "b", "c", "a", "b", "a", "b", "c", "d", "a", "b")

data.frame(action, description)

       action description
1       1           a
2      NA           b
3      NA           c
4       2           a
5      NA           b
6       3           a
7      NA           b
8      NA           c
9      NA           d
10      4           a
11     NA           b

I would like to concatenate the strings so that they appear as the same element such as:

  action description
1      1       a b c
2      2         a b
3      3     a b c d
4      4         a b

Hope that makes sense, appreciate any help!


Solution

  • tidyverse way would be to fill the action column with previous non-NA value then group_by Action and paste the description together.

    library(tidyverse)
    
    df %>%
     fill(action) %>%
     group_by(action) %>%
     summarise(description = paste(description, collapse = " "))
    
    
    #  action description
    #   <dbl> <chr>      
    #1     1. a b c      
    #2     2. a b        
    #3     3. a b c d    
    #4     4. a b