Search code examples
unixrcomm

Is there an equivalent of unix "comm" command in R?


I have one master file with a list of unique IDs and want to align three files with subsets of IDs alongside this, ending up with: Column 1 (id1, id2, id3, id4 etc) Column 2 (space, id2, space, space) Column 3 (id1, id2, space space) Column 4 (id1, space id3 space) etc. I have a unique list in R and the "comm" command in unix seems to do this - is there an equivalent in R?


Solution

  • The structure of your data is not very clear, but if you start with the following vectors :

    R> master <- paste("id",1:10,sep="")
    R> sub1 <- paste("id",c(2,3,5),sep="")
    R> sub2 <- paste("id",c(1,4,8,9),sep="")
    R> master
    [1] "id1"  "id2"  "id3"  "id4"  "id5"  "id6"  "id7"  "id8"  "id9"  "id10"
    R> sub1
    [1] "id2" "id3" "id5"
    R> sub2
    [1] "id1" "id4" "id8" "id9"
    

    You can create a data frame from your master list of ids, and use these ids as row names :

    R> df <- data.frame(master=master, row.names=master)
    R> df
         master
    id1     id1
    id2     id2
    id3     id3
    id4     id4
    id5     id5
    id6     id6
    id7     id7
    id8     id8
    id9     id9
    id10   id10
    

    Then you can add new columns for each subset the following way :

    R> df[sub1, "sub1"] <- sub1
    R> df[sub2, "sub2"] <- sub2
    

    With the following result :

    R> df
         master sub1 sub2
    id1     id1 <NA>  id1
    id2     id2  id2 <NA>
    id3     id3  id3 <NA>
    id4     id4 <NA>  id4
    id5     id5  id5 <NA>
    id6     id6 <NA> <NA>
    id7     id7 <NA> <NA>
    id8     id8 <NA>  id8
    id9     id9 <NA>  id9
    id10   id10 <NA> <NA>