Search code examples
rcomparisonintersect

comparing two variables in the same column in R


I have two columns. one has a list of variables e.g. "cat", "dog", "rat", "chicken" and the other is whether the pet shop was visited on the first or second trip.

visit_number    pet
      1         dog
      2         dog
      1         cat
      2         cat
      1         rat
      2         chicken

I am looking to get compare the differences between the two visits in R e.g. intersect() and setdiff(). Basically exactly the same as this question:

Compare two lists in R

However, I don't have two lists but have two variables in a single column and I cant seem to get the code to work.

what I am trying to achieve is a function like this but that uses the single column instead rather than the two lists (code taken from the other question):

xtab_set <- function(A,B){
    both    <-  union(A,B)
    inA     <-  both %in% A
    inB     <-  both %in% B
    return(table(inA,inB))
}

Solution

  • Frankly speaking, the output matrix is not very clear. However, you mentioned at the comment that you are “looking for the number (count) of unique individual animals per visit that occurred only in visit one, only in visit two and occurred i both visits.” Also in the document you provided there are three visits. I am considering three visits.

    The following code will show the number of unique individual animals by visits as well as number of unique individual animals that appeared in all visits.

    Step 1. Build a raw dataset

    library(data.table)
    df = data.table(visit_number = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3), 
                    pet = c("Dog", "Rat", "Cat", "Dog", "Chicken", "Cat", "Dog", "Cat", "Fish", "Horse"))
    

    Step 2. Create a vector of understandable column names for future reference

    cols = c(paste0(rep("Visit", length(unique(df$visit_number))), unique(df$visit_number)))
    

    Step 3. Create a matrix of pets appearance

    df = dcast.data.table(df, pet ~ visit_number, value.var = "pet", fun.aggregate = length)
    names(df)[-1] = cols # assign understandable column names
    

    Step 4. Define pets that appeared in all visits

    df[, AllVisits := Reduce(`*`, .SD), .SDcols = cols]
    

    It gives:

    df
           pet Visit1 Visit2 Visit3 AllVisits
    1:     Cat      1      1      1         1
    2: Chicken      0      1      0         0
    3:     Dog      1      1      1         1
    4:    Fish      0      0      1         0
    5:   Horse      0      0      1         0
    6:     Rat      1      0      0         0
    

    Rat was unique for Visit 1, Chicken was unique for Visit 2, Fish and Horse were unique for Visit 3. Cat and Dog appeared in all visits.

    Step 5. Get the number of unique number of animals by visits and unique number of animals that appeared in all visits

    idx = df[, Reduce(`+`, .SD) == 1, .SDcols = cols]
    unlist(c(df[idx, lapply(.SD, function(x) sum(x)), .SDcols = cols], AllVisits = df[, sum(AllVisits)]))
    

    The result is:

    Visit1    Visit2    Visit3 AllVisits 
         1         1         2         2 
    

    Let me know if that is what you are looking for.

    P.S. The code will require modification if pets may appear several times during the visit.