I have a tibble with a list of words for each row. I want to create a new variable from a function that searches for a keyword and, if it finds the keyword, creates a string composed of the keyword plus-and-minus 3 words.
The code below is close, but, rather than grabbing all three words before and after my keyword, it grabs the single word 3 ahead/behind.
df <- tibble(words = c("it", "was", "the", "best", "of", "times",
"it", "was", "the", "worst", "of", "times"))
df <- df %>% mutate(chunks = ifelse(words=="times",
paste(lag(words, 3),
words,
lead(words, 3), sep = " "),
NA))
The most intuitive solution would be if the lag
function could do something like this: lead(words, 1:3)
but that doesn't work.
Obviously I could pretty quickly do this by hand (paste(lead(words,3), lead(words,2), lead(words,1),...lag(words,3)
), but I'll eventually actually want to be able to grab the keyword plus-and-minus 50 words--too much to hand-code.
Would be ideal if a solution existed in the tidyverse, but any solution would be helpful. Any help would be appreciated.
One option would be sapply
:
library(dplyr)
df %>%
mutate(
chunks = ifelse(
words == "times",
sapply(
1:nrow(.),
function(x) paste(words[pmax(1, x - 3):pmin(x + 3, nrow(.))], collapse = " ")
),
NA
)
)
Output:
# A tibble: 12 x 2
words chunks
<chr> <chr>
1 it NA
2 was NA
3 the NA
4 best NA
5 of NA
6 times the best of times it was the
7 it NA
8 was NA
9 the NA
10 worst NA
11 of NA
12 times the worst of times
Although not an explicit lead
or lag
function, it can often serve the purpose as well.