Search code examples
rdplyrdatasetrow

How to separate in two different columns values that are located on two different columns as pivot format


I would like to create two columns from this dataset originating from just a singular one:

structure(list(Z = c("51", "51", "51", "51", "51", "51", "51", 
"51", "60", "60", "60", "60", "60", "60", "60", "60"), A = c("Unp", 
"Dip", "Dit", "Tip", "Unp (%)", "Dip (%)", "Dit (%)", "Tip (%)", 
"Unp", "Dip", "Dit", "Tip", "Unp (%)", "Dip (%)", "Dit (%)", 
"Tip (%)"), B = c(74, 190, 120, 284, 150, 147.222222222222, 101.39, 
125, 74, 179, 112, 265, 150, 139.583333333333, 106.94, 118.91
)), row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"
))

In such a way that the column A and B from the original dataset, will be set in that way like for this small example:

structure(list(var1 = c("A", "A (%)", "B", "B (%)"), var2 = c(1, 
23, 34, 245)), class = "data.frame", row.names = c(NA, -4L)) 

   var1 var2  var3  var4
1     A    1  A (%)   23
3     B   34  B (%)  245

Theferore I would like to move those row that differ just from one character (in this case the presence of (%)) but share a similar part. Any idea on how to do this?

Thanks


Solution

  • If the absolute values and percentages alternate every other row, in base R you could use cbind and %% this way, assigning your data as df:

    cbind(df[seq_len(nrow(df)) %% 2 != 0,],
          df[seq_len(nrow(df)) %% 2 == 0,])
    
    #   var1 var2  var1 var2
    # 1    A    1 A (%)   23
    # 3    B   34 B (%)  245
    

    Alternatively you could use do.call after spliting on the presence of a "%" using grep:

    do.call(cbind, split(df, grep("%", df$var1)))
    
    #   2.var1 2.var2 4.var1 4.var2
    # 1      A      1  A (%)     23
    # 3      B     34  B (%)    245
    

    In your specific sample data, you can first split on the like terms (regardless of %) using substr then split again on the "%" - essentially all I did was replace df$var1 with split(df, substr(df$var1, 1, 3)) then removed some extra/repeated columns:

    temp <- do.call(cbind, split(df, grep("%", split(df, substr(df$A, 1, 3)))))
    
    #remove columns
    final <- temp[,-grep("splt$|Z", names(temp))[-1]]
    
    #  1.Z     1.A 1.B     2.A      2.B     3.A    3.B     4.A    4.B
    #1  51     Unp  74     Dip 190.0000     Dit 120.00     Tip 284.00
    #2  51 Unp (%) 150 Dip (%) 147.2222 Dit (%) 101.39 Tip (%) 125.00
    #3  60     Unp  74     Dip 179.0000     Dit 112.00     Tip 265.00
    #4  60 Unp (%) 150 Dip (%) 139.5833 Dit (%) 106.94 Tip (%) 118.91