Probably already answered, but I'm struggling to find the answer to this question: In a new column 'new_text', how to fill down a given text to another given text, and so on...
In the example below, how to fill 'Potter' to 'Wisley' then 'Wisley' to 'Granger', etc...?
The idea is to apply the proposed solution to dataframes of thousands of lines (obtained with pdftools::pdf_data) by selecting a sequence of specific words to fill down in this way.
Thanks for help.
> dat0
text new_text
1 Potter Potter
2 hj7d Potter
3 kl8ep Potter
4 f3d Potter
5 rtyzs2 Potter
6 Wisley Wisley
7 lq6s Wisley
8 2fg Wisley
9 Granger Granger
10 r8ka Granger
11 h9 Granger
12 qm9ne Granger
Data:
dat0 <-
structure(list(text = c("Potter", "hj7d", "kl8ep", "f3d", "rtyzs2",
"Wisley", "lq6s", "2fg", "Granger", "r8ka", "h9", "qm9ne"), new_text = c("Potter",
"Potter", "Potter", "Potter", "Potter", "Wisley", "Wisley", "Wisley",
"Granger", "Granger", "Granger", "Granger")), class = "data.frame", row.names = c(NA,
-12L))
One way is to convert the non-names to NA and then use fill
from tidyr
. You'll need to set up the specific words (names) that you want to keep first.
library(tidyr)
Names <- c("Potter", "Wisley", "Granger")
transform(dat0, text=ifelse(text %in% Names, text, NA)) |>
fill(text)
text new_text
1 Potter Potter
2 Potter Potter
3 Potter Potter
4 Potter Potter
5 Potter Potter
6 Wisley Wisley
7 Wisley Wisley
8 Wisley Wisley
9 Granger Granger
10 Granger Granger
11 Granger Granger
12 Granger Granger