Search code examples
rdataframematrixtranspose

Transpose of data.frame turns first row into a list


I have data as follows:

library(data.table)
DF <- structure(list(toberevised = c("Number of returns", "Number of joint returns", 
"Number with paid preparer's signature"), `SOUTH DAKOTA_All returns` = c(135257620, 
52607676, 80455243), `SOUTH DAKOTA_Under_50000` = c(92150166, 
20743943, 53622647)), row.names = c(NA, -3L), class = c("data.table", 
"data.frame"))

I would like the first column as variables and the variables in a column so I did:

DF<- as.data.frame(t(DF))
setnames(DF, DF[1,])

But then I get the error:

Passed a vector of type 'list'. Needs to be type 'character'

I have tried everything I could think of to unlist the list, but to no avail.

What am I doing wrong here?


Solution

  • Transposing a data.frame is dangerous because t() returns a matrix where all elements ("cells") have been coerced to the same data type:

    t(DF)
    
                             [,1]                [,2]                      [,3]                                   
    toberevised              "Number of returns" "Number of joint returns" "Number with paid preparer's signature"
    SOUTH DAKOTA_All returns "135257620"         " 52607676"               " 80455243"                            
    SOUTH DAKOTA_Under_50000 "92150166"          "20743943"                "53622647"
    

    Now, all numeric values have been coerced to type character which was not intended, propably.

    As mentioned several times before, here and here, I recommend to reshape data into a tidy format, i.e., long format, to simplify data processing:

    library(data.table)
    long <- melt(DF, id.vars = "toberevised")
    long
    
                                 toberevised                 variable     value
    1:                     Number of returns SOUTH DAKOTA_All returns 135257620
    2:               Number of joint returns SOUTH DAKOTA_All returns  52607676
    3: Number with paid preparer's signature SOUTH DAKOTA_All returns  80455243
    4:                     Number of returns SOUTH DAKOTA_Under_50000  92150166
    5:               Number of joint returns SOUTH DAKOTA_Under_50000  20743943
    6: Number with paid preparer's signature SOUTH DAKOTA_Under_50000  53622647
    

    From long format, we can reshape into the desired wide format:

    dcast(long, variable ~ toberevised) 
    
                       variable Number of joint returns Number of returns Number with paid preparer's signature
    1: SOUTH DAKOTA_All returns                52607676         135257620                              80455243
    2: SOUTH DAKOTA_Under_50000                20743943          92150166                              53622647
    

    Now, the numbers are still of type numeric.


    As a rule of thumb, whenever a column name is to be treated as an attribute, like SOUTH DAKOTA_Under_50000 it is likely that the data is not in tidy format. Attributes should be stored and treated as data items in order to use it for subsetting, grouping, and aggregation.

    Actually, SOUTH DAKOTA_Under_50000 contains two attributes, a region and a classification.