Suppose I have a collection of documents such as:
text = c("is it possible to highlight text for some words" ,
"suppose i want words like words to be red and words like text to be blue")
I am wondering whether it is possible to highlight documents (particularly for a large corpus) with colors for a pre-defined list of words using R. Each word in the list will get a specific color. For example, highlighting "words" to be red and "text" to be blue as shown below.
This is a somewhat hackish solution to this question and not very scalable for large corpus. I will be curious to see if there is a much more parsimonious, elegant, and scalable way to do this.
library(tidyverse)
library(crayon)
# define text
text <- c("is it possible to highlight text for some words" ,
"suppose i want words like words to be red and words like text to be blue")
# individuate words
unique_words <- function(x) {
purrr::map(.x = x,
.f = ~ unique(base::strsplit(x = ., split = " ")[[1]],
collapse = " "))
}
# creating a dataframe with crayonized text
df <-
tibble::enframe(unique_words(x = text)) %>%
tidyr::unnest() %>%
# here you can specify the color/word combinations you need
dplyr::mutate(.data = .,
value2 = dplyr::case_when(value == "text" ~ crayon::blue(value),
value == "words" ~ crayon::red(value),
TRUE ~ value)) %>%
dplyr::select(., -value)
# printing the text
print(cat(df$value2))
P.S. Unfortunately, reprex
doesn't work with colored text, so can't produce the complete reprex.