Search code examples
rstringintegercharacterroman-numerals

How can I identify and turn roman numbers into integers in "mixed" observations in R?


I have a data frame with a column that contains observations that mix characters (words) and roman numbers. It also has integers, only characters (like the observation "Apple"), and NA's, but I want to leave them unchanged.

So it has observations like:

x <- data.frame(col = c("15", "NA", "0", "Red", "iv", "Logic", "ix. Sweet", "VIII - Apple", 
"Big XVI", "WeirdVII", "XI: Small"))

What I want is to turn every observation that has a roman number (even the ones that are mixed with words), and turn them into integers. So, following the example, the resulting data frame would be like:

15 NA 0 Red 4 Logic 9 8 16 7 11

Is there any way to do this?

What I have attempted is:

library(stringr)
 
library(gtools)

roman <- str_extract(x$col, "([IVXivx]+)")

roman_to_int <- roman2int(roman)

x$col <- ifelse(!is.na(roman_to_int), roman_to_int, x$col)

However, this does not work because the observations that are character but do not include roman integers are also turned into roman numbers, like the one "Logic" turns as "1". I want to avoid this.


Solution

  • pat <-  "[IVXLCDM]{2,}|\\b[ivxlcdm]+\\b|\\b[IVXLCDM]+\\b"
    
    str_replace_all(x$col,pat, gtools::roman2int)
    
      [1] "15"        "NA"        "0"         "Red"       "4"        
      [6] "Logic"     "9. Sweet"  "8 - Apple" "Big 16"    "Weird7"   
      [11] "11: Small"