Search code examples
rmatrixsparse-matrixr-s4bioconductor

R - how to reload function (or change package priority)


I am using the Matrix library to work with sparse matrices. Occasionally, I need to run a function that uses a Bioconductor package, which depends on the S4Vectors library. Unfortunately, the "colSums" function in Matrix conflicts with the "colSums" function in S4Vectors. Therefore, when I run this function, it breaks my "colSums" function, which is really annoying.

I know that there are two common solutions to this problem: 1) Load the Bioconductor package before loading the Matrix library - however, I seldom use this function, so I would prefer to only load the Bioconductor package when I need it. 2) Instead of calling "colSums", call "Matrix::colSums" - however, this is super inconvenient and I would need to change my entire code base.

Ideally, I would just load the Bioconductor package, run my function, then cleanup my environment by either unloading the Bioconductor package or reloading the Matrix package. However, I am having trouble doing these. First, is it possible to reload the Matrix::colSums (so that it replaces S4Vectors::colSums)? Second, when I try to unload S4Vectors, R complains because many other packages depend on it.

So aside from the obvious question of why S4Vectors has a function that conflicts with the most used sparse matrix package in R, I'm wondering what the best solution to this problem is? It can't possibly be that difficult to simply reload a package, right?


Solution

  • The right way is, as you already know, to write Matrix::colSums.

    A simple solution, which does not require rewriting your code would be to add a line

    colSums <- Matrix::colSums
    

    somewhere in your code. Then this colSums belongs to your global environment, hence is found before any other libraries.

    EDIT

    I found a better solution. I will demonstrate with plyr and dplyr since they both have arrange function and cause a conflict.

    Example1. dplyr loaded later, hence wins.

    library(plyr)
    library(dplyr)
    environment(arrange)  
    # <environment: namespace:dplyr>
    

    Example 2. plyr wins

    # unload libraries
    unloadNamespace("plyr")
    unloadNamespace("dplyr")
    library(dplyr)
    library(plyr)
    environment(arrange)
    

    The key is the search order, which you can find by search function. Below, you can see plyr comes before dplyr.

    search()
    # [1] ".GlobalEnv"        "package:plyr"      "package:dplyr"     "tools:rstudio"    
    # [5] "package:stats"     "package:graphics"  "package:grDevices" "package:utils"    
    # [9] "package:datasets"  "package:methods"   "Autoloads"         "package:base" 
    

    Example 3. You can specify where in search list you want to load a library; pos argument.

    unloadNamespace("plyr")
    unloadNamespace("dplyr")    
    
    library(plyr)
    library(dplyr, pos=length(search()))
    environment(arrange)
    # <environment: namespace:plyr>
    
    search()
    # [1] ".GlobalEnv"        "package:plyr"      "tools:rstudio"     "package:stats"    
    # [5] "package:graphics"  "package:grDevices" "package:utils"     "package:datasets" 
    # [9] "package:methods"   "Autoloads"         "package:dplyr"     "package:base" 
    

    In conclusion, you can load Bioconductor library with giving a large number as pos. That said, you said Bioconductor depends on S4Vector and S4Vector is the one causing conflict. Unfortunately you cannot control the position for depended packages directly since require statement is within the Bioconductor package.
    A workaround is that you load S4Vector first with pos option, then load Bioconductor:

    library(S4Vector, pos=10)  # replace 10 by an appropriate large number
    library(Bioconductor)  
    

    Then, S4Vector will be placed after Matrix in the search order.

    YET ANOTHER SOLUTION

    If you want to reload Matrix, then you can also do like:

    library(dplyr)
    library(plyr)
    environment(arrange)
    # <environment: namespace:plyr>
    
    unloadNamespace("dplyr")
    library(dplyr)
    environment(arrange)
    # <environment: namespace:dplyr>