Wrangle dataframe in R, possibly with dcast

I have a data.frame quite large that I have to wrangle it a bit. the current structure is:

V1   V2 V3 V4 V5         V6        V7         V8         ...   Vn         Vn+1
chr1  1 A  T  sample_1   value_1   sample_2   value_4   ...   sample_n   value_7 
chr1 40 T  C  sample_1   value_2   sample_2   value_5   ...   sample_n   value_8
chr1 60 A  T  sample_1   value_3   sample_2   value_6   ...   sample_n   value_9
.
.
.
chrX 160 A  T  sample_1   value_x   sample_2   value_y   ...  sample_n value_ni

e.g. for the data_frame:

df <- structure(list(V1 = c(10L, 10L, 10L, 10L, 10L, 10L), V2 = c(3387501L, 
4174142L, 6419754L, 6419765L, 6419897L, 6419912L), V3 = c("T", 
"A", "C", "T", "G", "A"), V4 = c("A", 
"T", "A", "A", "C", "G"), V5 = c("LP2000748-DNA_H02", 
"LP2000748-DNA_H02", "LP2000748-DNA_H02", "LP2000748-DNA_H02", 
"LP2000748-DNA_H02", "LP2000748-DNA_H02"), V6 = c("0/0", "0/0", 
"1/1", "0/0", "0/0", "0/0"), V7 = c("LP2000748-DNA_A03", "LP2000748-DNA_A03", 
"LP2000748-DNA_A03", "LP2000748-DNA_A03", "LP2000748-DNA_A03", 
"LP2000748-DNA_A03"), V8 = c("0/0", "0/0", "1/1", "0/1", "0/0", 
"0/0"), V9 = c("LP2000795-DNA_B01", "LP2000795-DNA_B01", "LP2000795-DNA_B01", 
"LP2000795-DNA_B01", "LP2000795-DNA_B01", "LP2000795-DNA_B01"
), V10 = c("0/0", "0/0", "1/1", "0/0", "0/0", "0/0")), row.names = c(NA, 
-6L), class = c("data.table", "data.frame"))

What I'd like to have in the end is a table like this:

V1   V2 V3 V4 sample_1   sample_2   ...   sample_n    
chr1  1 A  T   value_1    value_4   ...    value_7 
chr1 40 T  C   value_2    value_5   ...    value_8
chr1 60 A  T   value_3    value_6   ...    value_9
.
.
.
chrX 160 A  T   value_x    value_y   ...  value_ni

What I've tried so far in R is:

samples_data <- seq(from = 5, to = dim(df)[2],by=2) variable_data <- samples_data + 1

new_df <- reshape2::dcast(df, V1 + V2 + V3 ~ colnames(df)[samples_data], value.var= colnames(df)[variable_data])

but I get this error message:

  recursive indexing failed at level 2
In addition: Warning message:
In if (!(value.var %in% names(data))) { :
  the condition has length > 1 and only the first element will be used

Does anyone have any suggestion on how to tackle this problem or on how to reshape the df?

Thanks!

Solution

You probably need to un-nest the data, then use reshape. To un-nest you could use Map to generate a list selecting the first four ID columns and from the rest of the columns a pattern 5,6; 7,8; 9,10. rbind the result and reshape.

cseq <- 5:ncol(df)
tmp <- do.call(rbind, Map(function(x, y) setNames(df[c(1:4, x:y)], 
                                                  c(names(df)[1:4], c("sample", "value"))), 
                   cseq[cseq %% 2 != 0], cseq[cseq %% 2 == 0]))
res <- reshape(tmp, idvar=1:4, timevar="sample", v.names="value", direction="wide")
res
#   V1      V2 V3 V4 value.LP2000748-DNA_H02 value.LP2000748-DNA_A03 value.LP2000795-DNA_B01
# 1 10 3387501  T  A                     0/0                     0/0                     0/0
# 2 10 4174142  A  T                     0/0                     0/0                     0/0
# 3 10 6419754  C  A                     1/1                     1/1                     1/1
# 4 10 6419765  T  A                     0/0                     0/1                     0/0
# 5 10 6419897  G  C                     0/0                     0/0                     0/0
# 6 10 6419912  A  G                     0/0                     0/0                     0/0