Search code examples
rmacostext

extract bold and italic text from a text document


I have text files and I am highlighting certain text in bold and italics. I would like a script which reads the .txt file and exports all text which is bold or italics into another document (text file).

Anyone know a way?

Preferably R solution, but can try other solutions.

Mac user


Solution

  • Suppose we have a markdown formatted text file ìn.md and we want to create another markdown file out.md containing only italic and bold sections.

    Content of file in.md:

    # Header
    
    There is *italic* and **bold** text!
    There is *another italic* and **another bold** text!
    
    library(tidyverse)
    
    text <- read_file("in.md")
    bold_texts <- text %>%
      str_extract_all("\\*\\*[^\\*]+\\*\\*") %>%
      purrr::simplify() %>%
      map_chr(~ .x %>% str_remove_all("\\*"))
    bold_texts
    #> [1] "bold"         "another bold"
    italic_texts <-
      text %>%
      str_remove_all(bold_texts %>% map_chr(~ paste0("\\*\\*", .x, "\\*\\*")) %>% paste0(collapse = "|")) %>%
      str_extract_all("\\*[^\\*]+\\*") %>%
      purrr::simplify() %>%
      map_chr(~ .x %>% str_remove_all("\\*"))
    italic_texts
    #> [1] "italic"         "another italic"
    
    out_text <- c("#Bold texts:", bold_texts, "#Italic texts:", italic_texts) %>% paste0(collapse = "\n")
    cat(out_text)
    #> #Bold texts:
    #> bold
    #> another bold
    #> #Italic texts:
    #> italic
    #> another italic
    write_file(out_text, "out.md")
    

    Created on 2021-11-23 by the reprex package (v2.0.1)