I'm currently trying to work out how to pivot my dataframe (small dput below). Currently one column contains information on the country, ISO code, industry and sector. I need this information to be spread into 4 columns, with one corresponding value column. I've used melt and pivot_long functions before, but not sure how to produce 4 new columns as well as the value column.
DI_SMALL <- structure(list(V1 = structure(c(NA, NA, NA, NA, 1L, 1L, 1L, 1L
), .Label = "Energy Usage (TJ)", class = "factor"), V2 = structure(c(NA,
NA, NA, NA, 2L, 1L, 4L, 3L), .Label = c("Coal", "Natural Gas",
"Nuclear Electricity", "Petroleum"), class = "factor"), V3 = structure(c(5L,
4L, 7L, 6L, 3L, 2L, 1L, 1L), .Label = c("0", "1.29327085460648e-05",
"1.59504500372979e-05", "AFG", "Afghanistan", "Agriculture",
"Industries"), class = "factor"), V4 = structure(c(5L, 4L, 7L,
6L, 3L, 2L, 1L, 1L), .Label = c("0", "6.53466630114587e-06",
"8.05944706428482e-06", "AFG", "Afghanistan", "Fishing", "Industries"
), class = "factor"), V5 = structure(c(5L, 4L, 6L, 7L, 3L, 2L,
1L, 1L), .Label = c("0", "1.88562621206664e-05", "2.32557880912235e-05",
"AFG", "Afghanistan", "Industries", "Mining and Quarrying"), class = "factor"),
V6 = structure(c(5L, 4L, 7L, 6L, 3L, 2L, 1L, 1L), .Label = c("0",
"2.00284547443433e-05", "2.47018365704401e-05", "AFG", "Afghanistan",
"Food & Beverages", "Industries"), class = "factor")), row.names = c("V1",
"V2", "V3", "V4", "X", "X.1", "X.2", "X.3"), class = "data.frame")
Ideally the output would then contain 7 columns. The existing first to columns, Country, ISO, Industry and Sector then the Value.Like this:
Output <- structure(list(NA. = structure(c(1L, 1L, 1L, 1L), .Label = "Energy Usage (TJ)", class = "factor"),
NA..1 = structure(c(2L, 1L, 4L, 3L), .Label = c("Coal ",
"Natural Gas", "Nuclear Electricity", "Petroleum"), class = "factor"),
Country = structure(c(1L, 1L, 1L, 1L), .Label = "Afghanistan", class = "factor"),
ISO = structure(c(1L, 1L, 1L, 1L), .Label = "AFG", class = "factor"),
Industry = structure(c(1L, 1L, 1L, 1L), .Label = "Industries", class = "factor"),
Sector = structure(c(1L, 1L, 1L, 1L), .Label = "Agriculture", class = "factor"),
Value = c(1.595045004, 1.2932706, 0, 0)), class = "data.frame", row.names = c(NA,
-4L))
Hopefully that makes sense, any ideas would be greatly appreciated!
Thanks
This is not a case where pivot_long
is suitable because you have variables mapped to both rows and columns, and they are not names of the columns/rows. Intead you have to extract these attributes from the variables and then build the data.frame
"manually". Here's an example, I suggest checking the variable values in each step for better understanding the process here:
library(dplyr)
df <- DI_SMALL %>%
mutate_all(as.character)
row_attr <- paste0(df$V1, "/", df$V2)
row_attr <- row_attr[row_attr!= "NA/NA"]
col_attr <- df[1:4, -(1:2)] %>%
apply(MARGIN = 2, function(x) paste0(x, collapse = "/"))
values <- df[-(1:4), -(1:2)] %>%
mutate_all(as.numeric) %>%
as.matrix() %>%
c()
out <- expand.grid(row_attr, col_attr)
out <- cbind(out, values)
out <- out %>%
tidyr::separate(col = "Var1", into = c("NA.", "NA..1"), sep = "/") %>%
tidyr::separate(col = "Var2",
into = c("Country", "ISO", "Industry", "Sector"),
sep = "/")
out[1:4]
I think the results in Output
and in the values of the DI_SMALL
are in different scales, but other than that, this seems like the desired output.
NA. NA..1 Country ISO Industry Sector values
1 Energy Usage (TJ) Natural Gas Afghanistan AFG Industries Agriculture 1.595045e-05
2 Energy Usage (TJ) Coal Afghanistan AFG Industries Agriculture 1.293271e-05
3 Energy Usage (TJ) Petroleum Afghanistan AFG Industries Agriculture 0.000000e+00
4 Energy Usage (TJ) Nuclear Electricity Afghanistan AFG Industries Agriculture 0.000000e+00