Search code examples
rdata-analysis

Two-column output of setdiff() in R


My work is with proteomics analysis, and I'm using R for the first time.

So my input is in .txt and is a list of protein names, as I use the setdiff() function in a specific set of data instead of giving me a single column of proteins is giving me a double-column output. This didn't happen when I did the same thing with other data sets.

I already tried to look for a solution online and in R help but I couldn't find the same situation.

My lines are:

v1 <- readLines("C:\\Users\\ACER\\Documents\\V4\\Duo\\ECC_CC.txt")
v2 <- readLines("C:\\Users\\ACER\\Documents\\V4\\Duo\\ECC_CSC.txt") 
vlist <- list(v1, v2) 
names(vlist) <- c("Cell Line", "CSC") 
setdiff(v1, v2)

I even tried to switch the order on the input and edit the .txt files to see if the problem was with them.


Solution

  • Welcome to R and Stack Overflow! 😀

    First of all, the setdiff() function calculates the asymmetric set difference of two vectors, according to its documentation (you can see this by doing ?base::setdiff on the R console).

    If you take a look into the examples, you will see that there is no need to put the dataframes into a list. You can just directly apply the setdiff() function to see which elements of x don't appear in y:

    x <- c(sort(sample(1:20, 9)), NA)
    y <- c(sort(sample(3:23, 7)), NA)
    setdiff(x, y)
    

    Or, in another example with the mtcars data, if you are working with dataframes (which is your case), you can just directly apply the setdiff() function to see which elements of mtcars don't appear in mtcars2:

    mtcars2 <- mtcars[-1,]
    setdiff(mtcars, mtcars2)
    

    So, once you make sure that both v1 and v2 dataframes contain the same variables, you could try to do this:

    v1 <- readLines("C:\\Users\\ACER\\Documents\\V4\\Duo\\ECC_CC.txt")
    v2 <- readLines("C:\\Users\\ACER\\Documents\\V4\\Duo\\ECC_CSC.txt") 
    setdiff(v1, v2)
    

    You could also do it performing an outer join with the merge function, take a look at this post.

    I hope that this helps you!