Search code examples
rdataframetransformreshapedcast

Reshaping a non column name dataframe in r


I have a data frame like this:

structure(list(...1 = c(NA, NA, "name_var1", "obs1_var1", "obs2_var1"
), ...2 = c(NA, NA, "name_var2", "obs1_var2", "obs2_var2"), ...3 = c(NA, 
NA, "name_var3", "obs1_var3", "obs2_var3"), ...4 = c("Dimension", 
"Subdimension", "name_var4", "obs1_var4", "obs2_var4"), ...5 = c("Dimension1", 
"Subdimension1", "question1.1.1", "1", "4"), ...6 = c("Dimension1", 
"Subdimension1", "question1.1.2", "3", "2"), ...5.1 = c("Dimension1", 
"Subdimension2", "question1.2.1", "1", "2"), ...5.2 = c("Dimension1", 
"Subdimension2", "question1.2.2", "4", "1"), ...5.3 = c("Dimension2", 
"Subdimension1", "question2.1.1", "1", "4"), ...6.1 = c("Dimension2", 
"Subdimension1", "question2.1.2", "3", "2"), ...5.4 = c("Dimension2", 
"Subdimension2", "question2.2.1", "1", "2"), ...5.5 = c("Dimension2", 
"Subdimension2", "question2.2.2", "4", "1")), class = "data.frame", row.names = c(NA, 
-5L))

and I'd like to transform into this one

structure(list(name_var1 = c("obs1_var1", "obs1_var1", "obs1_var1", 
"obs1_var1", "obs1_var1", "obs1_var1", "obs1_var1", "obs1_var1"
), name_var2 = c("obs1_var2", "obs1_var2", "obs1_var2", "obs1_var2", 
"obs1_var2", "obs1_var2", "obs1_var2", "obs1_var2"), name_var3 = c("obs1_var3", 
"obs1_var3", "obs1_var3", "obs1_var3", "obs1_var3", "obs1_var3", 
"obs1_var3", "obs1_var3"), name_var4 = c("obs1_var4", "obs1_var4", 
"obs1_var4", "obs1_var4", "obs1_var4", "obs1_var4", "obs1_var4", 
"obs1_var4"), Dimension = c("Dimension1", "Dimension1", "Dimension1", 
"Dimension1", "Dimension2", "Dimension2", "Dimension2", "Dimension2"
), Subdimension = c("Subdimension1", "Subdimension1", "Subdimension2", 
"Subdimension2", "Subdimension1", "Subdimension1", "Subdimension2", 
"Subdimension2"), Question = c("question1.1.1", "question1.1.2", 
"question1.2.1", "question1.2.2", "question2.1.1", "question2.1.2", 
"question2.2.1", "question2.2.2"), Value = c(1, 3, 1, 4, 1, 3, 
1, 4)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-8L))

...And continuing with all observations there are in the original dataframe. Any idea how to do this?

Thank you in advance for your comments and assistance.


Solution

  • If you have to do a lot of this kind of data transformation, then I recommend the unpivotr package. There is a related package called tidyxl that is useful for reading Excel files in a raw, cell-by-cell way. This can be useful when column headers are in text, but the cells below are numbers, dates, logical, etc. Also, you can even use formatting information which is sometimes necessary for properly extracting information from an xlsx file.

    The package author made a free online book Spreadsheet Munging Strategies that goes through many situations. In your case, you could use

    library(tidyverse)
    library(unpivotr)
    
    start %>% 
        as_cells() %>% 
        behead("up", Dimension) %>% 
        behead("up", Subdimension) %>%
        behead("up", Question) %>%
        behead("left", name_var1) %>% 
        behead("left", name_var2) %>% 
        behead("left", name_var3) %>% 
        behead("left", name_var4) %>% 
        select(name_var1:name_var4, Dimension:Question, Value = chr)
    

    where start is your starting data frame. Actually, it's best to read the raw excel file using the tidyxl package, but it's not necessary.