Search code examples
rplyrdenormalized

Span matrix from relational table in R


Often I use data stemming from a SQL database in R. Typically I do a lot of the juggling in SQL, but using plyr more and more recently I wondered whether it's easier in R to span a matrix from a relational data table. Here's an example.

I have a table like

id   question answer
 1        6      10
 1        4       1
 1        5     2003
 3        6       2

 #reproduce it with dput output:
 structure(list(result = c(1, 1, 1, 3), question = c(6, 4, 5, 
 6), answer = c("10", "1", "2003", "2")), .Names = c("id", 
 "question", "answer"), row.names = c("1", "2", "3", "4"), class = "data.frame")

and I would like to arrange as de-normalized matrix:

id question.6 question.4 question.5
1       10        1          2003
3        2     

and so on.. I fixed this in SQL using a CASE WHEN syntax but can't manage to do it in R, for example like this:

  Select min((case when (question_id` = 6)
  then answer end)) AS `question.6`

Solution

  • dcast in reshape2 will do that work:

    > z
      id question answer
    1  1        6     10
    2  1        4      1
    3  1        5   2003
    4  3        6      2
    > dcast(z, id ~ question, value_var = "answer")
      id    4    5  6
    1  1    1 2003 10
    2  3 <NA> <NA>  2