This dataset represents a genome map positions (chr and start) with the sum of the sequencing coverage (depth) of each position for 20 individuals (dat)
Example:
gbsgre <- "chr start end depth
chr1 3273 3273 7
chr1 3274 3274 3
chr1 3275 3275 8
chr1 3276 3276 4
chr1 3277 3277 25"
gbsgre <- read.table(text=gbsgre, header=T)
This dataset represents a genome map positions (V1 plus V2) with individual coverage (V3) for each position.
Example:
df1 <- "chr start depth
chr1 3273 4
chr1 3276 4
chr1 3277 15"
df1 <- read.table(text=df1, header=T)
df2 <- "chr start depth
chr1 3273 3
chr1 3274 3
chr1 3275 8
chr1 3277 10"
df2 <- read.table(text=df2, header=T)
dat <- NULL
dat[[1]] <- df1
dat[[2]] <- df2
> dat
[[1]]
chr start depth
1 chr1 3273 4
2 chr1 3276 4
3 chr1 3277 15
[[2]]
chr start depth
1 chr1 3273 3
2 chr1 3274 3
3 chr1 3275 8
4 chr1 3277 10
According to the chr
and start
position on gbsgre
, I need to cross all the 20 depths (V3) of each 20 animals ([[1]] to [[20]]) to the main table (gbsgre) to generate a final table as follows:
The first column will be the chromosome position (V1), second column (V2) will be the start position, third will be the depth (V3) of the “gbsgre” dataset, the fourth (V4) will be the depth (dat/V3) of the [[1]] from “dat”, and so on, until the twenty-fourth column, which will be the depth of the [[20]] on the “dat” dataset.
But a very important thing is that, missing data on the 20 individuals should be considered like zero (“0”).
And the number of final table should be the same of “gbsgre”.
#Example Result
> GBSMeDIP
chr start depth depth1 depth2
1: chr1 3273 7 4 3
2: chr1 3274 3 0 3
3: chr1 3275 8 0 8
4: chr1 3276 4 4 0
5: chr1 3277 25 15 10
Using data.table
:
# set names to your list `dat` first
setattr(dat, 'names', paste0("depth", seq_along(dat)))
# bind them by rows and reshape to wide form
dcast(rbindlist(dat, idcol="id"), chr + start ~ id, fill=0L)
# chr start depth1 depth2
# 1: chr1 3273 4 3
# 2: chr1 3274 0 3
# 3: chr1 3275 0 8
# 4: chr1 3276 4 0
# 5: chr1 3277 15 10