I have data as follows:
library(data.table)
DF <- structure(list(toberevised = c("Number of returns", "Number of joint returns",
"Number with paid preparer's signature"), `SOUTH DAKOTA_All returns` = c(135257620,
52607676, 80455243), `SOUTH DAKOTA_Under_50000` = c(92150166,
20743943, 53622647)), row.names = c(NA, -3L), class = c("data.table",
"data.frame"))
I would like the first column as variables and the variables in a column so I did:
DF<- as.data.frame(t(DF))
setnames(DF, DF[1,])
But then I get the error:
Passed a vector of type 'list'. Needs to be type 'character'
I have tried everything I could think of to unlist the list, but to no avail.
What am I doing wrong here?
Transposing a data.frame is dangerous because t()
returns a matrix where all elements ("cells") have been coerced to the same data type:
t(DF)
[,1] [,2] [,3] toberevised "Number of returns" "Number of joint returns" "Number with paid preparer's signature" SOUTH DAKOTA_All returns "135257620" " 52607676" " 80455243" SOUTH DAKOTA_Under_50000 "92150166" "20743943" "53622647"
Now, all numeric values have been coerced to type character which was not intended, propably.
As mentioned several times before, here and here, I recommend to reshape data into a tidy format, i.e., long format, to simplify data processing:
library(data.table)
long <- melt(DF, id.vars = "toberevised")
long
toberevised variable value 1: Number of returns SOUTH DAKOTA_All returns 135257620 2: Number of joint returns SOUTH DAKOTA_All returns 52607676 3: Number with paid preparer's signature SOUTH DAKOTA_All returns 80455243 4: Number of returns SOUTH DAKOTA_Under_50000 92150166 5: Number of joint returns SOUTH DAKOTA_Under_50000 20743943 6: Number with paid preparer's signature SOUTH DAKOTA_Under_50000 53622647
From long format, we can reshape into the desired wide format:
dcast(long, variable ~ toberevised)
variable Number of joint returns Number of returns Number with paid preparer's signature 1: SOUTH DAKOTA_All returns 52607676 135257620 80455243 2: SOUTH DAKOTA_Under_50000 20743943 92150166 53622647
Now, the numbers are still of type numeric.
As a rule of thumb, whenever a column name is to be treated as an attribute, like SOUTH DAKOTA_Under_50000
it is likely that the data is not in tidy format. Attributes should be stored and treated as data items in order to use it for subsetting, grouping, and aggregation.
Actually, SOUTH DAKOTA_Under_50000
contains two attributes, a region and a classification.