Search code examples
rbioconductorgenetics

Extracting values from matchPattern


I am using matchPattern function from Biostrings package to find particular sequences in the genome. Once found, I want to show and frequency distribution of the spacing between the matched instances.

Example: running the following code

Match1 <- matchPattern(ResEnz, genome$chr1)
Match1

will return this:

Views on a 248956422-letter DNAString subject
subject: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
views:
            start       end width
    [1]     27974     27979     6 [GAATTC]
    [2]     29889     29894     6 [GAATTC]
    [3]     32212     32217     6 [GAATTC]
    [4]     36941     36946     6 [GAATTC]
    [5]     49920     49925     6 [GAATTC]
    ...       ...       ...   ... ...
[67137] 248927762 248927767     6 [GAATTC]
[67138] 248928956 248928961     6 [GAATTC]
[67139] 248929077 248929082     6 [GAATTC]
[67140] 248932486 248932491     6 [GAATTC]
[67141] 248941974 248941979     6 [GAATTC]

Now, I want to use this data to form a vector that will have the differences between the endpoint of one entry and the start point of the subsequent one. (ignoring very first start point and very last endpoint)

I.e., for Match1

1910, 2318, 4724, 12974 .... 9483 

The generated Match1 object is of XStringViews class, names function returns NA, and I'm currently perplexed as to how to go about this. Please, help.


Solution

  • Upon further investigation, I have found that functions start(Match1) and end(Match1) will yield vectors containing the values of interest. I'll leave this here in case someone else runs into the same problem.