Search code examples
rpivottidyversetidyrreshape2

Pivot longer: Multiple rows to columns in R


I'm currently trying to work out how to pivot my dataframe (small dput below). Currently one column contains information on the country, ISO code, industry and sector. I need this information to be spread into 4 columns, with one corresponding value column. I've used melt and pivot_long functions before, but not sure how to produce 4 new columns as well as the value column.

DI_SMALL <- structure(list(V1 = structure(c(NA, NA, NA, NA, 1L, 1L, 1L, 1L
), .Label = "Energy Usage (TJ)", class = "factor"), V2 = structure(c(NA, 
NA, NA, NA, 2L, 1L, 4L, 3L), .Label = c("Coal", "Natural Gas", 
"Nuclear Electricity", "Petroleum"), class = "factor"), V3 = structure(c(5L, 
4L, 7L, 6L, 3L, 2L, 1L, 1L), .Label = c("0", "1.29327085460648e-05", 
"1.59504500372979e-05", "AFG", "Afghanistan", "Agriculture", 
"Industries"), class = "factor"), V4 = structure(c(5L, 4L, 7L, 
6L, 3L, 2L, 1L, 1L), .Label = c("0", "6.53466630114587e-06", 
"8.05944706428482e-06", "AFG", "Afghanistan", "Fishing", "Industries"
), class = "factor"), V5 = structure(c(5L, 4L, 6L, 7L, 3L, 2L, 
1L, 1L), .Label = c("0", "1.88562621206664e-05", "2.32557880912235e-05", 
"AFG", "Afghanistan", "Industries", "Mining and Quarrying"), class = "factor"), 
    V6 = structure(c(5L, 4L, 7L, 6L, 3L, 2L, 1L, 1L), .Label = c("0", 
    "2.00284547443433e-05", "2.47018365704401e-05", "AFG", "Afghanistan", 
    "Food & Beverages", "Industries"), class = "factor")), row.names = c("V1", 
"V2", "V3", "V4", "X", "X.1", "X.2", "X.3"), class = "data.frame")

Ideally the output would then contain 7 columns. The existing first to columns, Country, ISO, Industry and Sector then the Value.Like this:

Output <- structure(list(NA. = structure(c(1L, 1L, 1L, 1L), .Label = "Energy Usage (TJ)", class = "factor"), 
    NA..1 = structure(c(2L, 1L, 4L, 3L), .Label = c("Coal ", 
    "Natural Gas", "Nuclear Electricity", "Petroleum"), class = "factor"), 
    Country = structure(c(1L, 1L, 1L, 1L), .Label = "Afghanistan", class = "factor"), 
    ISO = structure(c(1L, 1L, 1L, 1L), .Label = "AFG", class = "factor"), 
    Industry = structure(c(1L, 1L, 1L, 1L), .Label = "Industries", class = "factor"), 
    Sector = structure(c(1L, 1L, 1L, 1L), .Label = "Agriculture", class = "factor"), 
    Value = c(1.595045004, 1.2932706, 0, 0)), class = "data.frame", row.names = c(NA, 
-4L))

Hopefully that makes sense, any ideas would be greatly appreciated!

Thanks


Solution

  • This is not a case where pivot_long is suitable because you have variables mapped to both rows and columns, and they are not names of the columns/rows. Intead you have to extract these attributes from the variables and then build the data.frame "manually". Here's an example, I suggest checking the variable values in each step for better understanding the process here:

    library(dplyr)
    
    df <- DI_SMALL %>% 
      mutate_all(as.character) 
    
    row_attr <-  paste0(df$V1, "/", df$V2)
    row_attr <- row_attr[row_attr!= "NA/NA"]
    
    col_attr <- df[1:4, -(1:2)] %>%
      apply(MARGIN = 2, function(x) paste0(x, collapse = "/"))
    
    values <- df[-(1:4), -(1:2)] %>%
      mutate_all(as.numeric) %>%
      as.matrix() %>%
      c()
    
    out <- expand.grid(row_attr, col_attr)
    out <- cbind(out, values)
    
    out <- out %>% 
      tidyr::separate(col = "Var1", into = c("NA.", "NA..1"), sep = "/") %>%
      tidyr::separate(col = "Var2", 
                      into = c("Country", "ISO", "Industry", "Sector"),
                      sep = "/")
    
    out[1:4]
    

    I think the results in Output and in the values of the DI_SMALL are in different scales, but other than that, this seems like the desired output.

                    NA.               NA..1     Country ISO   Industry      Sector       values
    1 Energy Usage (TJ)         Natural Gas Afghanistan AFG Industries Agriculture 1.595045e-05
    2 Energy Usage (TJ)                Coal Afghanistan AFG Industries Agriculture 1.293271e-05
    3 Energy Usage (TJ)           Petroleum Afghanistan AFG Industries Agriculture 0.000000e+00
    4 Energy Usage (TJ) Nuclear Electricity Afghanistan AFG Industries Agriculture 0.000000e+00