Search code examples
rvectorsubsetsubtraction

How to subtract a complete character vector with repeated characters from the other vector in R


I want to subtract y from x, which means remove one "A", three "B" and one "E" from x, so xNew will be c("A", "C", "A","B","D"). It also means

length(xNew)=length(x) - length(y)
x <- c("A","A","C","A","B","B","B","B","D","E")
y <- c("A","B","B","B","E")

setdiff doesn't work because

xNew <- setdiff(x,y)
xNew 
[1] "C" "D"

match also doesn't work

xNew <- x[-match(y,x)]
xNew
[1] "A" "C" "A" "B" "B" "B" "D"

It removes "B" on the fifth position 3 times, so there are still three "B" left.

Is anyone know how to do this, is there a function available in R or we should write a private function? Thanks a lot in advance.


Solution

  • You can use the function pmatch:

    x[-pmatch(y,x)]
    #[1] "A" "C" "A" "B" "D"
    

    Edit
    If your data can be strings of more than 1 character, here is an option to get what you want:

    xNew <- unlist(sapply(x[!duplicated(x)], 
                          function(item, tab1, tab2) {
                              rep(item,
                                  tab1[item] - ifelse(item %in% names(tab2), tab2[item], 0))
                           }, tab1=table(x), tab2=table(y)))
    

    Example

    x <- c("AB","BA","C","CA","B","B","B","B","D","E")
    y <- c("A","B","B","B","E")
    xNew
    #  AB   BA    C   CA    B    D 
    #"AB" "BA"  "C" "CA"  "B"  "D"