Search code examples
rregexstringlatexsymbols

Detect (and replace) math symbols in a string


TL;DR; How can I detect the presence of math symbols in a string?

I collect a lot of text data from others, through sources like google forms or directly in spreadsheets. Often the individuals doing the data entry copy text from somewhere else, say a webpage or PDF and along with the text comes math symbols.

Example string where $\pi$ is copied as a symbol. "and π-d orbital"

R reads this in perfectly fine, and in markdown code it will even print/display it perfectly fine in an HTML format (see example). However, I need to render this text content to PDF.

Which of course Latex does not like and throws the following error

! LaTeX Error: Unicode character μ (U+03BC)
               not set up for use with LaTeX.

I would like to write some gsub/str_detect type code to find any special characters so I can replace them with their proper latex symbol: $\pi$.

I tried the following code to detect non-letter characters, but that didn't work (returned a FALSE, meaning it didn't detect the symbols).

stringr::str_detect("and π-d orbital", "[a-zA-Z]", negate = TRUE)

Suggestions? Is there a LaTeX solution?


Solution

  • Setting negate = TRUE is basically saying, "does this string not include any characters in "[a-zA-Z]"?". This is a different question from what you want, which is "does this string include any characters that aren’t in "[a-zA-Z]" ?". To ask that question, use ^ inside []. Note you’ll also want to include whitespace, "-", and any other "acceptable" characters.

    stringr::str_detect("and π-d orbital", "[^\\s\\-a-zA-Z]")
    # TRUE