I have extracted multiple tables from a PDF which contains strings over multiple lines. I have used the extract_table() function from the tabulizer package, the only problem being that the strings import as separate rows.
e.g.
action <- c(1, NA, NA, 2, NA, 3, NA, NA, NA, 4, NA)
description <- c("a", "b", "c", "a", "b", "a", "b", "c", "d", "a", "b")
data.frame(action, description)
action description
1 1 a
2 NA b
3 NA c
4 2 a
5 NA b
6 3 a
7 NA b
8 NA c
9 NA d
10 4 a
11 NA b
I would like to concatenate the strings so that they appear as the same element such as:
action description
1 1 a b c
2 2 a b
3 3 a b c d
4 4 a b
Hope that makes sense, appreciate any help!
tidyverse
way would be to fill
the action
column with previous non-NA value then group_by
Action
and paste
the description
together.
library(tidyverse)
df %>%
fill(action) %>%
group_by(action) %>%
summarise(description = paste(description, collapse = " "))
# action description
# <dbl> <chr>
#1 1. a b c
#2 2. a b
#3 3. a b c d
#4 4. a b