Search code examples
pythonpython-3.xbioinformaticsbiopythonfasta

How to print out certain lines in a file in Python


I would like some help in figuring out how I can print out only a given number of lines in a .txt file.

I created a function file(x,y) with 2 input parameters, the first one 'x' which is the file, and the second one 'y' which is what decides how many lines it's going to print.

Example: lets say that the files name is x.txt and the contents inside the file are:

>Sentence 1
I like playing games
>Sentence 2
I like jumping around
>Sentence 3
I like dancing
>Sentence 4
I like swimming
>Sentence 5
I like riding my bike

And what I want to do with those contents are for it to read then to print out the sentences in the file when I call file("x.txt",3), so it's only going to print the first 3 lines like in this sample output:

'I like playing games'
'I like jumping around'
'I like dancing'

Here is what I have done so far:

def file(x, y):
    file = open(x, 'r')
    g = list(range(y))
    h = [a for i, a in enumerate(file) if i in g]
    return " ' ".join(h)

I wasn't able to figure out how to have the program print the number of lines that the user inputs, but so far when I run run the program this is what I get:

>Sentence 1
 ' I like playing games
 ' >Sentence 2

I only want it to print the sentences, and I don't want it to print the ">Sentence #" part.

Will someone be able to help me figure this out? Thank You!


Solution

  • A simple native Python solution, I'm assuming lines that don't start with > are the 'sentence' lines:

    from itertools import islice
    
    def extract_lines(in_file, num):
        with open(in_file) as in_f:
            gen = (line for line in in_f if not line.startswith('>'))
            return '\n'.join(islice(gen, num))
    

    But is this is actually FASTA format (now it is clear this is true) then I suggest using BioPython instead:

    from Bio import SeqIO
    from itertools import islice
    
    def extract_lines(in_file, num):
        with open(in_file) as in_f:
            gen = (record.seq for record in SeqIO.parse(in_f, 'fasta'))
            return list(islice(gen, num))