I have data like
set.seed(6)
df <- data.frame(t = as.Date("2014/1/1")+seq(0,100.25,.25),
name = paste(sample(c("Alert_","NonOp_"),402,replace=TRUE),
sample(1:10,402,replace=TRUE),sep=""),
unit = c(rep(1:10,each=40),10,10))
head(df)
Here is some representative data
head(df)
t name unit
1 2014-01-01 NonOp_3 1
2 2014-01-01 NonOp_6 1
3 2014-01-01 Alert_5 1
4 2014-01-01 Alert_7 1
5 2014-01-02 NonOp_4 1
6 2014-01-02 NonOp_2 1
How to generate a table from the names, where the table has columns of unit, t, and then names in the name column gets coerced into columns of name factored without the Alert/NonOp, and the values in the factored names columns should be NA, A (for alert), and N (for NonOp). Here is the type of table I'm looking for, if all the numbers above were for unit 1.
unit t name_1 name_2 name_3 name_4 name_5 name_6 name_7 name_8 ...
1 2014-01-01 NA NA N NA A A A NA
1 2014-01-02 NA N NA N NA NA NA NA
The goal is to the named alerts / nonops into a table ordered by unit / t and write the table to a file. And read the file into excel.
It sounds like the following is what you're looking for:
library(reshape2)
newdf <- cbind(df, colsplit(df$name, "_", c("V1", "V2")))
newdf$V1 <- factor(newdf$V1, c("NonOp", "Alert"), c("N", "A"))
newdf$V2 <- paste0("name_", newdf$V2)
head(newdf)
# t name unit V1 V2
# 1 2014-01-01 NonOp_3 1 N name_3
# 2 2014-01-01 NonOp_6 1 N name_6
# 3 2014-01-01 Alert_5 1 A name_5
# 4 2014-01-01 Alert_7 1 A name_7
# 5 2014-01-02 NonOp_4 1 N name_4
# 6 2014-01-02 NonOp_2 1 N name_2
head(dcast(newdf, t ~ V2, value.var = "V1"))
# t name_1 name_10 name_2 name_3 name_4 name_5 name_6 name_7 name_8 name_9
# 1 2014-01-01 <NA> <NA> <NA> N <NA> <NA> <NA> <NA> <NA> <NA>
# 2 2014-01-01 <NA> <NA> <NA> <NA> <NA> <NA> N <NA> <NA> <NA>
# 3 2014-01-01 <NA> <NA> <NA> <NA> <NA> A <NA> <NA> <NA> <NA>
# 4 2014-01-01 <NA> <NA> <NA> <NA> <NA> <NA> <NA> A <NA> <NA>
# 5 2014-01-02 <NA> <NA> <NA> <NA> N <NA> <NA> <NA> <NA> <NA>
# 6 2014-01-02 <NA> <NA> N <NA> <NA> <NA> <NA> <NA> <NA> <NA>
Basically, start by splitting the "name" column into two columns, and then use dcast
. The other steps are mostly cosmetic.
Another approach would be to use my cSplit
function and dcast.data.table
from the "data.table" package.
Skipping things like creating "name_blah" and replacing "NonOp" with "N" and so on, you can just directly do:
dcast.data.table(cSplit(df, "name", "_"), t ~ name_2, value.var = "name_1")
# t 1 10 2 3 4 5 6 7 8 9
# 1: 2014-01-01 NA NA NA NonOp NA NA NA NA NA NA
# 2: 2014-01-01 NA NA NA NA NA NA NonOp NA NA NA
# 3: 2014-01-01 NA NA NA NA NA Alert NA NA NA NA
# 4: 2014-01-01 NA NA NA NA NA NA NA Alert NA NA
# 5: 2014-01-02 NA NA NA NA NonOp NA NA NA NA NA
# ---
# 398: 2014-04-10 NA NA NA NA NA NA NonOp NA NA NA
# 399: 2014-04-10 NA NA NA NA NonOp NA NA NA NA NA
# 400: 2014-04-10 NonOp NA NA NA NA NA NA NA NA NA
# 401: 2014-04-11 NA NA NA NonOp NA NA NA NA NA NA
# 402: 2014-04-11 NA NA Alert NA NA NA NA NA NA NA