TL;DR; How can I detect the presence of math symbols in a string?
I collect a lot of text data from others, through sources like google forms or directly in spreadsheets. Often the individuals doing the data entry copy text from somewhere else, say a webpage or PDF and along with the text comes math symbols.
Example string where $\pi$ is copied as a symbol. "and π-d orbital"
R reads this in perfectly fine, and in markdown code it will even print/display it perfectly fine in an HTML format (see example). However, I need to render this text content to PDF.
Which of course Latex does not like and throws the following error
! LaTeX Error: Unicode character μ (U+03BC)
not set up for use with LaTeX.
I would like to write some gsub
/str_detect
type code to find any special characters so I can replace them with their proper latex symbol: $\pi$
.
I tried the following code to detect non-letter characters, but that didn't work (returned a FALSE
, meaning it didn't detect the symbols).
stringr::str_detect("and π-d orbital", "[a-zA-Z]", negate = TRUE)
Suggestions? Is there a LaTeX solution?
Setting negate = TRUE
is basically saying, "does this string not include any characters in "[a-zA-Z]"
?". This is a different question from what you want, which is "does this string include any characters that aren’t in "[a-zA-Z]"
?". To ask that question, use ^
inside []
. Note you’ll also want to include whitespace, "-"
, and any other "acceptable" characters.
stringr::str_detect("and π-d orbital", "[^\\s\\-a-zA-Z]")
# TRUE