I am extracting tables from word documents using the docxtractr package, but one of my tables is not turning out well.
After extraction, it looks like
Column A | Column A value |
---|---|
Column B value | |
Column C | Column C value |
and I want it to look like
Column A | Column B | Column C |
---|---|---|
Column A value | Column B value | Column C value |
Is there a way to format table 1 to table 2?
Or perhaps a better way of extracting the values/tables from the word document?
TIA
I'm still looking for solutions.
If d is your table ...
## create example table:
d <- structure(list(Var1 = c("Column A", NA, "Column C"), Var2 = c("Column A value",
"Column B value", "Column C value")), class = "data.frame", row.names = c(NA,
3L))
> d
Var1 Var2
1 Column A Column A value
2 <NA> Column B value
3 Column C Column C value
... you can use {dplyr} and {tidyr} to substitute missing column names and reshape to wide format like this:
library(dplyr)
library(tidyr)
d |>
mutate(Var1 = ifelse(is.na(Var1), paste0('Column_', row_number()), Var1)) |>
pivot_wider(names_from = Var1, values_from = Var2)
`Column A` Column_2 `Column C`
<chr> <chr> <chr>
1 Column A value Column B value Column C value
You might need to set the header
argument to FALSE
upon import: docx_extract_tbl(..., header = FALSE, ...)