Search code examples
rbioinformaticsbioconductorbed

Average interval lengths in BED files with Bioconductor


I'm trying to execute a pretty simple operation but I haven't figured it out. I'm trying to get the average interval length of all intervals in a particular BED file I have imported in R. This BED file contains no overlapping intervals. This is what the file looks like:

GRanges object with 12917252 ranges and 3 metadata columns:
         seqnames               ranges strand |                 name     score                thick
            <Rle>            <IRanges>  <Rle> |          <character> <numeric>            <IRanges>
     [1]     chr1       [10524, 10551]      + |        1:10524-10551       122       [10538, 10538]
     [2]     chr1       [11236, 11258]      + |        1:11236-11258        43       [11247, 11247]
     [3]     chr1       [11456, 11474]      + |        1:11456-11474        47       [11465, 11465]
     [4]     chr1       [12054, 12099]      + |        1:12054-12099       206       [12077, 12077]
     [5]     chr1       [12276, 12330]      + |        1:12276-12330       249       [12303, 12303]

Any operation would apply to the ranges column


Solution

  • Using IRanges::width():

    library(GenomicRanges) #loads IRanges, too.
    
    #dummy data
    gr = GRanges("chr1",IRanges(
      start = c(11236, 11456, 12054, 12276),
      end = c(11258, 11474, 12099, 12330)))
    
    #get mean of ranges' "lengths" using width(), then take the mean().
    mean(width(gr))
    # [1] 35.75
    

    ?width

    width(x): The number of integer values in each range. This is a vector of non-negative integers of the same length as x.