Search code examples
rnumberstext-processingqdap

qdap package: bug in converting zero digits to "zero" words


Before (as a rookie) I go submitting this as an R package bug, let me run it by y'all. I think all of the following are good:

replace_number("123 0 boogie")
[1] "one hundred twenty three boogie"
replace_number("1;1 foo")
[1] "one;one foo"
replace_number("47 bar")
[1] "forty seven bar"
replace_number("0")
"zero"

I think all of the following are bad because "zero" is missing from the output:

replace_number("1;0 foo")
[1] "one; foo"
replace_number("00 bar")
[1] "bar"
replace_number("0x")
[1] "x"

Basically, I'd say that replace_number() is incapable of handling strings that contain the digit 0 (except for "0"). Is it a real bug?


Solution

  • If you dig into the guts of replace_number:

     unlist(lapply(lapply(gsub(",([0-9])", "\\1", text.var), function(x) {
            if (!is.na(x) & length(unlist(strsplit(x, "([0-9])", 
                perl = TRUE))) > 1) {
                num_sub(x, num.paste = num.paste)
            }
            else {
                x
            }
        }), function(x) mgsub(0:9, ones, x)))
    

    you can see that the problem occurs in qdap:::num_sub

    qdap:::num_sub("101", num.paste = "combine") ## "onehundredone"
    qdap:::num_sub("0", num.paste = "combine")   ## ""
    

    Digging within that function, the issue occurs in numb2word, which has internal codes

    ones <- c("", "one", "two", "three", "four", "five", "six", 
        "seven", "eight", "nine")
    names(ones) <- 0:9
    

    which convert zero values to blanks. If I were facing this problem myself I would fork the qdap repo, go to replace_number.R, and try to change this in a backward compatible way so that replace_number could take a logical argument blank_zeros=TRUE, which got passed down to numb2word and did the right thing, e.g.

    ones <- c(if (blank_zeros) "" else "zero",
              "one", "two", "three", "four", "five", "six", 
        "seven", "eight", "nine")
    

    In the meantime I have posted this on the qdap issues list.