Search code examples
pythonfile-read

Read file contents in python conditionally


I am trying to read one chromosome sequence from a genome file in python. The format of the genome file is like the following but with more lines of sequence for each chromosome:

Chr1

ATCGTGTGATGGTGCGTAGATGCTGAT

GCTGATGTGTCGAGCGATGCTGAGTCG

Chr2

TGCGTGATGCTGAGCGATGCTGATGCT

TAGCTGACCACACACCTGTTTTGTAGG

Chr3

CAGTCGTAGCGATGCTGATGATGCTGA

GGTTGGTTGGCGGACCACCATTACTAT

I use the following code to read the whole genome sequence. However, I just want the sequence of one chromosome (e.g. whole sequence of Chr2). Rather than reading the whole genome, then searching the pattern for Chr2, is there any other way I could do this?

Thank you

   with open("genome.txt") as f:
       for line in f:
           genome.append(line.rstrip())

Solution

  • Open the file and read line by line until you find 'Chr2'.

    Consume all non-empty lines until you reach EOF or any line beginning with 'Chr'

    def getgenomes(gfile):
        g = []
        for line in gfile:
            if line.startswith('Chr'):
                break
            if (line := line.strip()):
                g.append(line)
        return g
    
    with open('genome.txt', encoding='utf-8') as gfile:
        genomes = None
        for line in gfile:
            if line.startswith('Chr2'):
                genomes = getgenomes(gfile)
                break
        print(genomes)
    

    output:

    ['TGCGTGATGCTGAGCGATGCTGATGCT', 'TAGCTGACCACACACCTGTTTTGTAGG']