I think my answer is "split" and a for loop but as considerably new to R i cant really make it. So i have a dataframe as:
row.names start end length transcript
1 NM_008866.1 22 714 693 NM_008866
2 NM_008866.2 125 196 72 NM_008866
3 NM_008866.3 129 242 114 NM_008866
...
14 NM_001159750.37 221 1123903 NM_001159750
15 NM_001159750.40 453 557 105 NM_001159750
16 NM_001159750.41 570 644 75 NM_001159750
...
and a DNAStringset as:
A DNAStringSet instance of length 2
width seq names
[1] 2433 GCACTGTCCGCCAGCCGGTGGATGTGCG...TGTGAAATAAAATTTAATTTTGGCTTTA NM_008866
[2] 2668 ACTTCTACTTTCCAGTCTCCTGCGATCG...TCAATAAAGTTTTTTGTTGTTAAACATA NM_001159750
For every transcript name i want to apply a function (subseq()) on the right DNAstring set (right by name).The subseq function is going to take as arguments the start and stop columns of my dataframe iteratevily everytime.
For the moment: (think i should do some spliting on the dataframe and dataset right?)
results <- list()
for (myName in names(dataframe)){
localdf<- dataframe[[myName]]
localseqsplit <- dataset[[myName]]
results<-subseq(localseqsplit,start=localdf$start,end=localdf$end)
temp<-results[[myName]]
return(temp)
}
Since you don't have a reproducible example or a representative output here is by initial guess at what you are looking for.
# make very basic workign example
df <- read.table(header=T, text='
row.names start end length transcript
NM_008866.1 10 18 8 NM_008866
NM_008866.2 15 22 7 NM_008866
NM_008866.3 19 28 9 NM_008866
NM_001159750.37 5 22 17 NM_001159750
NM_001159750.40 8 30 22 NM_001159750
NM_001159750.41 12 32 20 NM_001159750')
# create the DNAStringSet
x0 <- c(NM_008866 = "GCACTGTCCGCCAGCCGGTGGATGTGCG", NM_001159750="ACTTCTACTTTCCAGTCTCCTGCGATCGAAGC")
dna <- DNAStringSet(x0)
# split your dataset by transcript name
df_split <- split(df, f=df$transcript)
results <- list()
for(myName in names(dna)){
# get the index of which transcript you are working with
index <- which(names(dna) == myName)
# make sure the transcript is in your dataset
if(myName %in% names(df_split)){
# loop through the possible start and end indices
for(j in 1:nrow(df_split[[myName]])){
# take the given dna string and create substrings from given indices
dna_sub <- subseq(dna[index], start=df_split[[myName]]$start[j], end=df_split[[myName]]$end[j])
# append results to list element with transcript name
results[[myName]] <- append(results[[myName]], dna_sub)
}
}
}
results
> results
$NM_008866
A DNAStringSet instance of length 3
width seq names
[1] 9 GCCAGCCGG NM_008866
[2] 8 CCGGTGGA NM_008866
[3] 10 TGGATGTGCG NM_008866
$NM_001159750
A DNAStringSet instance of length 3
width seq names
[1] 18 CTACTTTCCAGTCTCCTG NM_001159750
[2] 23 CTTTCCAGTCTCCTGCGATCGAA NM_001159750
[3] 21 CCAGTCTCCTGCGATCGAAGC NM_001159750