Search code examples
rtexthighlightcpu-word

Color highlighting text in R for a pre-defined list of words


Suppose I have a collection of documents such as:

text = c("is it possible to highlight text for some words" , 
      "suppose i want words like words to be red and words like text to be blue")

I am wondering whether it is possible to highlight documents (particularly for a large corpus) with colors for a pre-defined list of words using R. Each word in the list will get a specific color. For example, highlighting "words" to be red and "text" to be blue as shown below.

enter image description here


Solution

  • This is a somewhat hackish solution to this question and not very scalable for large corpus. I will be curious to see if there is a much more parsimonious, elegant, and scalable way to do this.

    library(tidyverse)
    library(crayon)
    
    # define text
    text <- c("is it possible to highlight text for some words" , 
             "suppose i want words like words to be red and words like text to be blue")
    
    # individuate words
    unique_words <- function(x) {
      purrr::map(.x = x,
                 .f = ~ unique(base::strsplit(x = ., split = " ")[[1]],
                               collapse = " "))
    }
    
    # creating a dataframe with crayonized text
    df <- 
      tibble::enframe(unique_words(x = text)) %>%
      tidyr::unnest() %>%
    # here you can specify the color/word combinations you need 
      dplyr::mutate(.data = .,
                    value2 = dplyr::case_when(value == "text" ~ crayon::blue(value),
                                              value == "words" ~ crayon::red(value),
                    TRUE ~ value)) %>%
      dplyr::select(., -value) 
    
    # printing the text
    print(cat(df$value2))
    

    enter image description here

    P.S. Unfortunately, reprex doesn't work with colored text, so can't produce the complete reprex.