Search code examples
pythonstringfileidentifier

How to select and print the lines of a file delimited by two identifiers


I have a large file full of data subsets, each with a unique identifier.

I want to be able to find the first line containing the identifier and print that line along with every line after that one until the next data subset is reached (that line will start with <). The data is structured as shown below.

<ID1|ID1_x
AAA
BBB
CCC
<ID2|ID2_x
DDD
EEE
FFF
<ID3|ID3_x
...

I would like to print:

<(ID2)
DDD
EEE
FFF

So far I have:

with open('file.txt') as f:
    for line in f:
        if 'ID2' in line:
           print(line)
           ...


Solution

  • Try with the code below:

    found_id = False
    with open('file.txt') as f:
        for line in f:
            if '<ID' in line:
                if '<ID2' in line:
                    id_line_split = line.split('|')
                    id_line = id_line_split[0][1:]
                    print('<(' + str(id_line) + ')')
                    found_id = True
                else:
                    found_id = False
            else:
                if found_id == True:
                    # remove carriage return and line feed
                    line = line.replace('\n','')
                    line = line.replace('\r','')
                    print(line)
    

    The execution of previous code in my system, with your file.txt produces this output:

    <(ID2)
    DDD
    EEE
    FFF
    

    Second question (from comment)

    To select ID2 and ID23 (see questione in the comment of this answer), the program has been changed in this way:

    found_id = False
    with open('file.txt') as f:
        for line in f:
            if '<ID' in line:
                if ('<ID2' in line) or ('<ID23' in line):
                    id_line_split = line.split('|')
                    id_line = id_line_split[0][1:]
                    print('<(' + str(id_line) + ')')
                    found_id = True
                else:
                    found_id = False
            else:
                if found_id == True:
                    # remove carriage return and line feed
                    line = line.replace('\n','')
                    line = line.replace('\r','')
                    print(line)```