Search code examples
sql-serverrdcast

How to replicate dcast() from R in SQL Server


Currently I'm using R to transform my table data of the form:

ID    Code    Condition    WT
104   KEENTRAN  CON4      .30577
.     .         .          .
.     .         .          .

The link should work for anyone who would like to download my dataframe, otherwise here's a subset:

>dput(head(df))

structure(list(ID = c(104L, 368L, 10632L, 20385L, 24361L, 34378L
), Code = c("KEENTRAN", "ALEXEXPR", "MINNEXPMN", "JACKMOVWI", 
            "FREICOIN", "JBXEXPGA"), Condition = c("CON4", "CON4", "CON2", 
                                                   "CON2", "CON6", "CON5"), WT = c(0.3057717456, 0.7909870604, 1, 
                                                                                   1, 0.4301040524, 0.5977268575)), .Names = c("ID", "Code", "Condition", 
                                                                                                                               "WT"), class = c("tbl_df", "data.frame"), row.names = c(NA, -6L
                                                                                                                               ))

Background

My example data is in long form where the Condition variable ranges from "CON1" to "CON6", I want to recast my data into wide form, where the ID and Code values will be primary keys and the levels from Condition will be columns whose values take on the maximum value of WT for that particular ID, Code, Condition grouping (or zero if no such pairing exists). This can easily be accomplished in R using the dcast() function from the reshape2 package:

library(reshape2)

Result <- df %>% group_by(ID, Condition) %>% 
  summarise(value = max(as.numeric(WT))) %>% 
  dcast(ID ~ Condition)

Result[is.na(Result)] <- 0

I would like to replicate this data manipulation procedure in SQL Server, but I'm not sure how best to do so. Any help or insight is greatly appreciated.


Solution

  • Consider SQL Server's PIVOT operation:

    SELECT t.ID, 
           ISNULL(t.[CON1], 0) AS [CON1],
           ISNULL(t.[CON2], 0) AS [CON2],
           ISNULL(t.[CON3], 0) AS [CON3],
           ISNULL(t.[CON4], 0) AS [CON4],
           ISNULL(t.[CON5], 0) AS [CON5],
           ISNULL(t.[CON6], 0) AS [CON6]    
    FROM RDataFrame As r                  
    PIVOT 
      (
        MAX(r.[WT]) 
        FOR r.Condition IN ([CON1], [CON2], [CON3], [CON4], [CON5], [CON6])
      ) AS t
    
    -- ID   CON1    CON2    CON3    CON4            CON5    CON6
    -- 8    0       0       0       0.4394051665    0       0
    -- 10   0       0       0       0.6013843825    0       0
    -- 15   0       0       0       0.07231002554   0       0
    -- 21   0       0       0       0.6013843825    0       0
    -- 23   0       0       0       0.7720454793    0       0
    -- 80   0       1       0       0               0       0
    -- 104  0       0       0       0.3057717456    0       0
    -- 144  0       0       0       0.1430937996    0       0.2646439667
    -- 145  0       0       0       0.8276574       0       0
    -- 155  0       1       0       0.8977280575    0       0
    -- 156  0       0       0       0.8453629338    0       0
    -- 158  0       0       0       0.5221399019    0       0