Before (as a rookie) I go submitting this as an R package bug, let me run it by y'all. I think all of the following are good:
replace_number("123 0 boogie")
[1] "one hundred twenty three boogie"
replace_number("1;1 foo")
[1] "one;one foo"
replace_number("47 bar")
[1] "forty seven bar"
replace_number("0")
"zero"
I think all of the following are bad because "zero" is missing from the output:
replace_number("1;0 foo")
[1] "one; foo"
replace_number("00 bar")
[1] "bar"
replace_number("0x")
[1] "x"
Basically, I'd say that replace_number()
is incapable of handling strings that contain the digit 0 (except for "0"). Is it a real bug?
If you dig into the guts of replace_number
:
unlist(lapply(lapply(gsub(",([0-9])", "\\1", text.var), function(x) {
if (!is.na(x) & length(unlist(strsplit(x, "([0-9])",
perl = TRUE))) > 1) {
num_sub(x, num.paste = num.paste)
}
else {
x
}
}), function(x) mgsub(0:9, ones, x)))
you can see that the problem occurs in qdap:::num_sub
qdap:::num_sub("101", num.paste = "combine") ## "onehundredone"
qdap:::num_sub("0", num.paste = "combine") ## ""
Digging within that function, the issue occurs in numb2word
, which has internal codes
ones <- c("", "one", "two", "three", "four", "five", "six",
"seven", "eight", "nine")
names(ones) <- 0:9
which convert zero values to blanks. If I were facing this problem myself I would fork the qdap repo, go to replace_number.R, and try to change this in a backward compatible way so that replace_number
could take a logical argument blank_zeros=TRUE
, which got passed down to numb2word
and did the right thing, e.g.
ones <- c(if (blank_zeros) "" else "zero",
"one", "two", "three", "four", "five", "six",
"seven", "eight", "nine")
In the meantime I have posted this on the qdap issues list.