Search code examples
rbioinformaticsbed

Convert a data.frame in R to .bed format file


I have a data.frame that looks like this.

bed <- data.frame(chrom=c(rep("Chr1",5)),
                        chromStart=c(18915152,24199229,73730,81430,89350),
                        chromEnd=c(18915034,24199347,74684,81550,89768), 
                         strand=c("-","+","+","+","+"))

write.table(bed, "test_xRNA.bed",row.names = F,col.names = F, sep="\t", quote=FALSE) 

Created on 2022-07-29 by the reprex package (v2.0.1)

and I want to convert it into a bed file. I try to do it with the writing.table function, but I fail miserably by taking this error comment when I look at the intersect

Error: unable to open file or unable to determine types for file test_xRNA.bed

- Please ensure that your file is TAB delimited (e.g., cat -t FILE).
- Also ensure that your file has integer chromosome coordinates in the 
  expected columns (e.g., cols 2 and 3 for BED).

Any ideas of how I can properly convert a data.frame to a .bed file in R?

I have heard about the rtracklayer package, does anyone have an experience with it?

I have tried the following post but it does not work at all for me export file from R in bed format. Any help is highly appreciated


Solution

  • I think its a lot more complicated to make a bed file: Here is a solution I have been working on the last days

    suppressPackageStartupMessages(library(GenomicRanges))
    suppressPackageStartupMessages(library(rtracklayer))
    suppressPackageStartupMessages(library(tidyverse))
    
    # data 
    bed <- data.frame(chrom=c(rep("Chr1",5)),
                      chromStart=c(18915152,24199229,73730,81430,89350),
                      chromEnd=c(18915034,24199347,74684,81550,89768), 
                      strand=c("-","+","+","+","+"))
    
    # transform such as always chromStart < chromEnd
    bed2 <- bed |> 
    transform(chromStart=ifelse(chromStart>chromEnd,chromEnd,chromStart),
              chromEnd= ifelse(chromEnd<chromStart,chromStart,chromEnd))
    
    # Genomic Ranges 
    bed3 <- GenomicRanges::makeGRangesFromDataFrame(bed2)
    head(bed3)
    #> GRanges object with 5 ranges and 0 metadata columns:
    #>       seqnames            ranges strand
    #>          <Rle>         <IRanges>  <Rle>
    #>   [1]     Chr1 18915034-18915152      -
    #>   [2]     Chr1 24199229-24199347      +
    #>   [3]     Chr1       73730-74684      +
    #>   [4]     Chr1       81430-81550      +
    #>   [5]     Chr1       89350-89768      +
    #>   -------
    #>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
    
    # rtracklayer 
    bed4 <- rtracklayer::export(bed3, format="bed", ignore.strand = FALSE)
    bed4
    #> [1] "Chr1\t18915033\t18915152\t.\t0\t-" "Chr1\t24199228\t24199347\t.\t0\t+"
    #> [3] "Chr1\t73729\t74684\t.\t0\t+"       "Chr1\t81429\t81550\t.\t0\t+"      
    #> [5] "Chr1\t89349\t89768\t.\t0\t+"
    
    # write it as a bed file
    # this is essential to make sure that this works properly 
    write.table(bed4, "test.bed", sep="\t", col.names=FALSE, row.names = FALSE, append = TRUE, quote = FALSE) 
    

    Created on 2022-08-02 by the reprex package (v2.0.1)

    and now you have a functional bed file to work with the bed tools