Search code examples
rcsvvectorintersect

Create Vector from LookUp-Table / CSV-File in R


I have a CSV-file, where two columns contain one or more integers per cell.

df <- data.frame(x=c("a","b","a","b"), 
y=c("datatype 1","datatype 1","datatype 2", "datatype 2"), 
z=c("2,3", "1,2","1,2,3,4,5", "3"))

names(df) <- c("hypothesis", "type", "mass") 

> df
  hypothesis       type      mass
1          a datatype 1       2,3
2          b datatype 1       1,2
3          a datatype 2 1,2,3,4,5
4          b datatype 2         3

I want to extract those integers from the .csv as vectors and assign them to variables x (datatype 1, hypothesis a) and y (datatype 2, hypothesis a) in my code.

Right now, I'm using subset to filter the table by "datatype" (column 2) and which("hypothesis"/column 1) to get the corresponding "mass" values I need. In the next step I want to use intersect to find out, which elements are shared by x and y variables.

My question is, how can I get a .csv cell content like "1,2,3" into a vector, to which the intersect function is applicable?

When I just call the cell, I get typeof integer and when intersect is applied, the result is character(0). When I manually assign x <- c(1,2,3,4,5); y <- c(2,3) the result is - as it should be - 2 3


Solution

  • We can split the 'mass' by the 'type', split the string using strsplit, unlist, convert to numeric, get the unique elements and apply intersect to find the elements that are common across the list elements

    lst <- setNames(lapply(split(df$mass, df$type), function(x) 
           sort(unique(as.numeric(unlist(strsplit(as.character(x), ",")))))), c("x", "y"))
    
    Reduce(intersect, lst)