Search code examples
rvectormatrixindices

How to select and remove specific elements or find their index in a vector or matrix?


Let's say I have two vectors:

x <- c(1,16,20,7,2)

y <- c(1, 7, 5,2,4,16,20,10)

I want to remove elements in y that are not in x. That is, I want to remove elements 5, 4, 10 from y.

y
[1] 1 7 2 16 20 

In the end, I want vectors x and y to have to same elements. Order does not matter.

My thoughts: The match function lists the indices of the where the two vectors contains a matching element but I need a function is that essentially the opposite. I need a function that displays the indices where the elements in the two vectors don't match.

# this lists the indices in y that match the elements in x
match(x,y)
[1] 1 6 7 2 4   # these are the indices that I want; I want to remove
                # the other indices from y

Does anyone know how to do this? thank you


Solution

  • You are after intersect

    intersect(x,y)
    ## [1]  1 16 20  7  2
    

    If you want the indices for the elements of y in x, using which and %in% (%in% uses match internally, so you were on the right track here)

    which(y %in% x)
    ## [1] 1 2 4 6 7
    

    As @joran points out in the comments intersect will drop the duplicates, so perhaps a safe option, if you want to return true matches would be something like

    intersection <- function(x,y){.which <- intersect(x,y)
     .in <- x[which(x %in% y)]
     .in}
    
    
    x <- c(1,1,2,3,4)
    y <- c(1,2,3,3)
    
    intersection(x,y)
    ## [1] 1 1 2 3
    # compare with
    intersect(x,y)
    ## [1] 1 2 3
    
    intersection(y,x)
    ## [1] 1 2 3 3
    # compare with 
    intersect(y, x)
    ## [1] 1 2 3
    

    You then need to be careful about ordering with this modified function (which is avoided with intersect as it drops duplicated elements )


    If you want the index of those element of y not in x, simply prefix with ! as `%in% returns a logical vector

    which(!y%in%x)
    
    ##[1] 3 5 8
    

    Or if you want the elements use setdiff

    setdiff(y,x)
    ## [1]  5  4 10