Search code examples
pythonloopspython-2.6genetics

Python script to make basic file with chromosome information


I am trying to make some .bed files for genetic analysis. I am a python beginner. The files I want to make should be 3 columns, tab seperated, first column always the same (chromosome number) and 2nd and 3rd columns windows of size 200 starting at zero and ending at end of chromosome. Eg:

chr20 0 200
chr20 200 400
chr20 400 600
chr20 600 800
...

I have the size of the chromosome so at the moment I am trying to say 'while column 2 < (size of chrom) print line. I have a skeleton of a script but it is not quite working, due to my lack of experience. Here is what I have so far:

output = open('/homw/genotyping/wholegenome/Chr20.bed', 'rw') 

column2 = 0
column1 = 0
while column2 < 55268282:
    for line in output:
        column1 = column1 + 0
        column2 = column2 + 100

        print output >> "chr20" + '\t' + str(column1) + '\t' + str(column2)

If anyone can fix this simple script so that it does as I described, or writes a better solution that would be really appreciated. I considered making a script that could output all files for 20 chromosomes and chrX but as I need to specify the size of the chromosome I think I'll have to do each file separately.

Thanks in advance!


Solution

  • How about this:

    step = 200 # change values by this amount
    with open('Chr20.bed', 'w') as outfp:
       for val in range(0, 1000, step):  #increment by step, max value 1000
          outfp.write('{0}\t{1:d}\t{2:d}\n'.format('chr20', val, val+step))
    

    gives tab delimited output as requested

    chr20   0   200
    chr20   200 400
    chr20   400 600
    chr20   600 800
    chr20   800 1000
    

    Note: using with will automatically close the file for you when you are done, or an exception is encountered.

    This gives more information about the .format() function in case you are curious.