Search code examples
rreshapedata-cleaningdata-wrangling

Reshape data and expand cell into column and rows


I have dataset looking like:

enter image description here

structure(list(Var1 = c("a", "b", "c"), Var2 = c(1, 1, 2), Var3 = c("V1: company1, V2: 6178, V3: yes, V4: 1711920\r\nV1: company2, V2: 6336, V3: no, V4: 1777513\r\nV1: company3, V2: 17995, V3: yes, V4: 1547923\r\n", 
"V1: company4, V2: 3234, V3: yes, V4: 1711920\r\nV1: company5, V2: 45435, V3: no, V4: 1777513", 
"V1: company1, V2: 6178, V3: yes, V4: 1711920\r\nV1: company2, V2: 6336, V3: no, V4: 1777513\r\nV1: company3, V2: 17995, V3: yes, V4: 1547923\r\nV1: company1, V2: 6178, V3: yes, V4: 1711920\r\nV1: company2, V2: 6336, V3: no, V4: 1777513\r\nV1: company3, V2: 17995, V3: yes, V4: 1547923"
)), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"
))

The last column "Var3" contains multiple fields and rows in each cell, I want to expand each cell in "Var3" into rows and columns depends on the listed information using R, my expectation is to have something similar to:

enter image description here

structure(list(Var1 = c("a", "a", "a", "b", "b", "c", "c", "c", 
"c", "c", "c"), Var2 = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2), Var3_V1 = c("company1", 
"company2", "company3", "company4", "company5", "company1", "company2", 
"company3", "company1", "company2", "company3"), Var3_V2 = c(6178, 
6336, 17995, 3234, 45435, 6178, 6336, 17995, 6178, 6336, 17995
), Var3_V3 = c("yes", "no", "yes", "yes", "no", "yes", "no", 
"yes", "yes", "no", "yes"), Var3_V4 = c(1711920, 1777513, 1547923, 
1711920, 1777513, 1711920, 1777513, 1547923, 1711920, 1777513, 
1547923)), row.names = c(NA, -11L), class = c("tbl_df", "tbl", 
"data.frame"))

Solution

    • You can use
    ans <- list()
    for(i in 1:nrow(df)){
        ans[[i]] <- read.table(textConnection(df$Var3[i]) , header = F , sep = ",")
        ans[[i]] <- lapply(ans[[i]] , \(x) sub("V\\d+:" , "" , x))
        ans[[i]][["Var1"]] <- rep(df$Var1[i] , length(ans[[i]][[1]]))
        ans[[i]][["Var2"]] <- rep(df$Var2[i] , length(ans[[i]][[1]]))
        
    }
    
    ans <- data.frame(do.call(rbind , lapply(ans , \(x) do.call(cbind , x))))
    
    ans <- ans[c("Var1" , "Var2" , "V1" , "V2", "V3" , "V4" )]
    
    • Output
    > ans
    
       Var1 Var2        V1      V2    V3        V4
    1     a    1  company1    6178   yes   1711920
    2     a    1  company2    6336    no   1777513
    3     a    1  company3   17995   yes   1547923
    4     b    1  company4    3234   yes   1711920
    5     b    1  company5   45435    no   1777513
    6     c    2  company1    6178   yes   1711920
    7     c    2  company2    6336    no   1777513
    8     c    2  company3   17995   yes   1547923
    9     c    2  company1    6178   yes   1711920
    10    c    2  company2    6336    no   1777513
    11    c    2  company3   17995   yes   1547923