Please could you advise on the R code I could use in order to do the following operation :
let's say list N
:
n1
n2
n3
n4
and a list M
:
m1
m2
m3
m4
m5
C
, where for some pairs of coordinates (n,m) from the lists above, we have a numerical intensity; for example:
n1; m1; 100
n1; m2; 300
The question would be : what is the most efficient R code I could use in order to integrate the list N
, the list M
, and the dataframe C
, in order to obtain a dataframe with:
N
as the columns names M
as the rows names N
* M
, corresponding to the numerical values in the data frame C
.A little example would be :
n1 n2 n3 n4
m1 100 - - -
m2 300 - - -
m3 - - - -
m4 - - - -
m5 - - - -
You can do this using spread
in the tidyr
package, making sure to keep all values of n and m that appear in the two lists, even if they don't appear in C:
library(tidyr)
## Replicating the data
listN = list("n1","n2","n3","n4","n5")
listM = list("m1","m2","m3","m4","m5")
C = data.frame(n=c("n1","n2","n3"),m=c("m1","m2","m3"),I=c(100,300,400))
n m I
1 n1 m1 100
2 n2 m2 300
3 n3 m3 400
## Defining factor levels of C using listM and listN, and keeping unused levels when doing the spread
C$n = factor(C$n,levels=unlist(listN))
C$m = factor(C$m,levels=unlist(listM))
res = spread(C,key="n",value="I",drop=F)
This returns:
m n1 n2 n3 n4 n5
1 m1 100 NA NA NA NA
2 m2 NA 300 NA NA NA
3 m3 NA NA 400 NA NA
4 m4 NA NA NA NA NA
5 m5 NA NA NA NA NA