Search code examples
rstringrstringi

Unexpected behaviour with str_replace "NA"


I'm trying to convert a character string to numeric and have encountered some unexpected behaviour with str_replace. Here's a minimum working example:

library(stringr)
x <- c("0", "NULL", "0")

# This works, i.e. 0 NA 0
as.numeric(str_replace(x, "NULL", ""))

# This doesn't, i.e. NA NA NA
as.numeric(str_replace(x, "NULL", NA))

To my mind, the second example should work as it should only replace the second entry in the vector with NA (which is a valid value in a character vector). But it doesn't: the inner str_replace converts all three entries to NA.

What's going on here? I had a look through the documentation for str_replace and stri_replace_all but don't see an obvious explanation.

EDIT: To clarify, this is with stringr_1.0.0 and stringi_1.0-1 on R 3.1.3, Windows 7.


Solution

  • Look at the source code of str_replace.

    function (string, pattern, replacement) 
    {
        replacement <- fix_replacement(replacement)
        switch(type(pattern), empty = , bound = stop("Not implemented", 
            call. = FALSE), fixed = stri_replace_first_fixed(string, 
            pattern, replacement, opts_fixed = attr(pattern, "options")), 
            coll = stri_replace_first_coll(string, pattern, replacement, 
                opts_collator = attr(pattern, "options")), regex = stri_replace_first_regex(string, 
                pattern, replacement, opts_regex = attr(pattern, 
                    "options")), )
    }
    <environment: namespace:stringr>
    

    This leads to finding fix_replacement, which is at Github, and I've put it below too. If you run it in your main environment, you find out that fix_replacement(NA) returns NA. You can see that it relies on stri_replace_all_regex, which is from the stringi package.

    fix_replacement <- function(x) {
        stri_replace_all_regex(
            stri_replace_all_fixed(x, "$", "\\$"),
            "(?<!\\\\)\\\\(\\d)",
            "\\$$1")
    }
    

    The interesting thing is that stri_replace_first_fixed and stri_replace_first_regex both return c(NA,NA,NA) when run with your parameters (your string, pattern, and replacement). The problem is that stri_replace_first_fixed and stri_replace_first_regex are C++ code, so it gets a little trickier to figure out what's happening.

    stri_replace_first_fixed can be found here.

    stri_replace_first_regex can be found here.

    As far as I can discern with limited time and my relatively rusty C++ knowledge, the function stri__replace_allfirstlast_fixed checks the replacement argument using stri_prepare_arg_string. According to the documentation for that, it will throw an error if it encounters an NA. I don't have time to fully trace it beyond this, but I would suspect that this error may be causing the odd return of all NAs.