Search code examples
rdplyrtextfillmutate

How to fill down a given text up to another given text and so on in R?


Probably already answered, but I'm struggling to find the answer to this question: In a new column 'new_text', how to fill down a given text to another given text, and so on...

In the example below, how to fill 'Potter' to 'Wisley' then 'Wisley' to 'Granger', etc...?

The idea is to apply the proposed solution to dataframes of thousands of lines (obtained with pdftools::pdf_data) by selecting a sequence of specific words to fill down in this way.

Thanks for help.

> dat0
      text new_text
1   Potter   Potter
2     hj7d   Potter
3    kl8ep   Potter
4      f3d   Potter
5   rtyzs2   Potter
6   Wisley   Wisley
7     lq6s   Wisley
8      2fg   Wisley
9  Granger  Granger
10    r8ka  Granger
11      h9  Granger
12   qm9ne  Granger  

Data:

dat0 <-
structure(list(text = c("Potter", "hj7d", "kl8ep", "f3d", "rtyzs2", 
"Wisley", "lq6s", "2fg", "Granger", "r8ka", "h9", "qm9ne"), new_text = c("Potter", 
"Potter", "Potter", "Potter", "Potter", "Wisley", "Wisley", "Wisley", 
"Granger", "Granger", "Granger", "Granger")), class = "data.frame", row.names = c(NA, 
-12L))

Solution

  • One way is to convert the non-names to NA and then use fill from tidyr. You'll need to set up the specific words (names) that you want to keep first.

    library(tidyr)
    
    Names <- c("Potter", "Wisley", "Granger")
    
    transform(dat0, text=ifelse(text %in% Names, text, NA)) |>
      fill(text)
          text new_text
    1   Potter   Potter
    2   Potter   Potter
    3   Potter   Potter
    4   Potter   Potter
    5   Potter   Potter
    6   Wisley   Wisley
    7   Wisley   Wisley
    8   Wisley   Wisley
    9  Granger  Granger
    10 Granger  Granger
    11 Granger  Granger
    12 Granger  Granger