I have a CSV-file, where two columns contain one or more integers per cell.
df <- data.frame(x=c("a","b","a","b"),
y=c("datatype 1","datatype 1","datatype 2", "datatype 2"),
z=c("2,3", "1,2","1,2,3,4,5", "3"))
names(df) <- c("hypothesis", "type", "mass")
> df
hypothesis type mass
1 a datatype 1 2,3
2 b datatype 1 1,2
3 a datatype 2 1,2,3,4,5
4 b datatype 2 3
I want to extract those integers from the .csv as vectors and assign them to variables x
(datatype 1, hypothesis a) and y
(datatype 2, hypothesis a) in my code.
Right now, I'm using subset
to filter the table by "datatype" (column 2) and which
("hypothesis"/column 1) to get the corresponding "mass" values I need. In the next step I want to use intersect
to find out, which elements are shared by x
and y
variables.
My question is, how can I get a .csv cell content like "1,2,3" into a vector
, to which the intersect
function is applicable?
When I just call the cell, I get typeof
integer
and when intersect
is applied, the result is character(0)
. When I manually assign x <- c(1,2,3,4,5); y <- c(2,3)
the result is - as it should be - 2 3
We can split
the 'mass' by the 'type', split the string using strsplit
, unlist
, convert to numeric
, get the unique
elements and apply intersect
to find the elements that are common across the list
elements
lst <- setNames(lapply(split(df$mass, df$type), function(x)
sort(unique(as.numeric(unlist(strsplit(as.character(x), ",")))))), c("x", "y"))
Reduce(intersect, lst)