Search code examples
pandaswindowsformatbioinformaticsbed

Reading BED files into pandas dataframe (windows)


For a bioinformatics project, I would like to read a .BED file into a pandas dataframe and have no clue how I can do it and what tools/programs are required. Nothing I found on the internet was really applicable to me, as I am working on windows10 with Python 3.7 (Anaconda distribution).

Any help would be appreciated.


Solution

  • According to https://software.broadinstitute.org/software/igv/BED:

    A BED file (.bed) is a tab-delimited text file that defines a feature track.

    According to http://genome.ucsc.edu/FAQ/FAQformat#format1 is contains up to 12 fields (columns) and possible comment lines starting with the word 'track'. The following is a minimal program to read such a bed file into a pandas dataframe.

    import pandas as pd
    
    df = pd.read_csv('so58178958.bed', sep='\t', comment='t', header=None)
    header = ['chrom', 'chromStart', 'chromEnd', 'name', 'score', 'strand', 'thickStart', 'thickEnd', 'itemRgb', 'blockCount', 'blockSizes', 'blockStarts']
    df.columns = header[:len(df.columns)]
    

    This is just a very simple code snippet treating all lines starting with a 't' as comments. This should work as all 'chrom' field entries should start with either a 'c', an 's' or a digit.