I once again have a question about the kwic()
function from the quanteda
package. I want to extract the five words around a specific keyword (in the example below, these are "stack overflow" and "radio star"). However, after removing stopwords in the tokenization process, kwic()
does not return the actual window of 5 words pre and post the keyword, but less words than that. Is there a way to tell kwic()
to ignore stopwords when counting keywords in context?
Reprex below:
library(quanteda)
speech = c("This is the first speech. Many words are in this speech, but only few are relevant for my research question. One relevant word, for example, is the word stack overflow. However there are so many more words that I am not interested in assessing the sentiment of. Now I am also adding a few words that would not be removed as stopwords, as follows: Maintenance, Television, Superstar, Textual Analysis. Video killed the radio star is another sentence I would like to include.",
"This is a second speech, much shorter than the first one. It still includes the word of interest, but at the very end. stack overflow. Once again adding some non-stopwords: Maintenance, television, superstar, textual analysis. Video killed the radio star is another sentence I would like to include.",
"Finally, this is the third speech, and this speech does not include the word of interest so I'm not interested in assessing this speech. Here are some more non-stopwords: Maintenance, television, superstar, textual analysis")
data <- data.frame(id=1:3,
speechContent = speech)
test_corpus <- corpus(data,
docid_field = "id",
text_field = "speechContent")
test_tokens <- tokens(test_corpus,
remove_punct = TRUE,
remove_numbers = TRUE) %>%
tokens_remove(stopwords("en"), padding = TRUE) %>%
tokens_compound(pattern = phrase(c("stack overflow*", "radio star*")),
concatenator = " ")
test_kwic <- kwic(test_tokens,
pattern = c("stack overflow", "radio star"),
window = 5)
As @phiver suggested, using padding = FALSE
when removing stopwords fixed the issue. Thank you!