Search code examples
rnamespacessubset

Namespace collision within R's subset() function


I've stumble across the following namespace problem with R's subset function so many times, that I would like to ask for a more elegant solution than mine here:

Species <- 'setosa'
subset(iris, Species==Species)

returns the entire iris dataset, I think because Species==Species evaluates to true.

My solution would be

subset(iris, Species==get('Species', envir = .GlobalEnv)

but this would not work when the variable Species is only defined within the scope of a function.

It would of course also be possible to use a different variable name like species (lowercase) for the global variable. However, I think this would actually be less readable and as a end user I would actually expect R to allow this kind of comparison of two variables with the same name from different namespaces.


Solution

  • The base R subset function simply does not handle the case of identical variable names well. I think that's one of the reasons the subset help page contains the following warning

    This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

    That message is basically suggesting you use

    iris[iris$Species==Species, ]
    

    An alternative to the get() would be to use the globalenv() function to get the global environment

    subset(iris, Species==globalenv()$Species)
    

    If you are using dplyrs filter() function, there is a way to be explicit with the .env pronoun

    dplyr::filter(iris, Species==.env$Species)