I am using matchPattern
function from Biostrings
package to find particular sequences in the genome. Once found, I want to show and frequency distribution of the spacing between the matched instances.
Example: running the following code
Match1 <- matchPattern(ResEnz, genome$chr1)
Match1
will return this:
Views on a 248956422-letter DNAString subject
subject: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
views:
start end width
[1] 27974 27979 6 [GAATTC]
[2] 29889 29894 6 [GAATTC]
[3] 32212 32217 6 [GAATTC]
[4] 36941 36946 6 [GAATTC]
[5] 49920 49925 6 [GAATTC]
... ... ... ... ...
[67137] 248927762 248927767 6 [GAATTC]
[67138] 248928956 248928961 6 [GAATTC]
[67139] 248929077 248929082 6 [GAATTC]
[67140] 248932486 248932491 6 [GAATTC]
[67141] 248941974 248941979 6 [GAATTC]
Now, I want to use this data to form a vector that will have the differences between the endpoint of one entry and the start point of the subsequent one. (ignoring very first start point and very last endpoint)
I.e., for Match1
1910, 2318, 4724, 12974 .... 9483
The generated Match1 object is of XStringViews class, names function returns NA, and I'm currently perplexed as to how to go about this. Please, help.
Upon further investigation, I have found that functions start(Match1)
and end(Match1)
will yield vectors containing the values of interest. I'll leave this here in case someone else runs into the same problem.