My data has one column and I am trying to create additional columns with what’s after each “/” in the rows. Here are the first few rows of the data:
mydata <- data.frame(
ALL = factor(c(
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off",
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off/hi.on/hi.ov",
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off/hi.on/hi.ov",
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off/cni_at.p3x.4",
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off/cbr_LBL"
))
)
The result should look like this (data frame) with a “1” in the new column if the variable appears in the row and “0” if not:
Result <- data.frame(
ALL = factor(c(
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off",
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off/hi.on/hi.ov",
"/ca/put/sent_1fe. gr/eq2_on/eq2_off/hi.on/hi.ov",
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off/cni_at.p3x.4",
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off/cbr_LBL"
)),
ca = rep(1L, 5L),
put = rep(1L, 5L),
sent_1 = rep(1L, 5L),
fe.gr = rep(1L, 5L),
eq2_on = rep(1L, 5L),
eq2_off = rep(1L, 5L),
hi.on = c(0L, 1L, 1L, 0L, 0L),
hi.ov = c(0L, 1L, 1L, 0L, 0L),
cni_at.p3x.4 = c(0L, 0L, 0L, 1L, 0L),
cbr_LBL = rep(0:1, c(4L, 1L))
)
I have tried many functions including strsplit and sapply:
sapply(strsplit(as.character(mydata$ALL), “\\/”), “[[“, 2) #returns "ca"s only
sapply(strsplit(as.character(mydata$ALL), "\\/"), "[[", 3) #returns "put"s only
There are millions of rows and I’d greatly appreciate anything that is quick and efficient.
Using mtabuate
from the qdapTools package that I maintain:
library(qdapTools)
mtabulate(strsplit(as.character(dat[[1]]), "/"))
## V1 ca cbr_LBL cni_at.p3x.4 eq2_off eq2_on fe.gr hi.on hi.ov put sent_1 sent_1fe.gr
## 1 1 1 0 0 1 1 1 0 0 1 1 0
## 2 1 1 0 0 1 1 1 1 1 1 1 0
## 3 1 1 0 0 1 1 0 1 1 1 0 1
## 4 1 1 0 1 1 1 1 0 0 1 1 0
## 5 1 1 1 0 1 1 1 0 0 1 1 0