Search code examples
pythonappendblockcpu-wordslash

Data parsing in block units using list


Input:

ID   information1
Aa   information1-1
Ba   information1-2
Ca   Homo sapiens
Da   information1-4
//
ID   information2
Aa   information2-1
Ba   information2-2
Ca   information2-3
Da   information2-4
//

Expected output:

ID   information1
Aa   information1-1
Ba   information1-2
Ca   Homo sapiens
Da   information1-4
//

Result:

ID   information1
ID   information1
Aa   information1-1
ID   information1
Aa   information1-1
Ba   information1-2
ID   information1
Aa   information1-1
Ba   information1-2
Ca   Homo sapiens
ID   information1
Aa   information1-1
Ba   information1-2
Ca   Homo sapiens
Da   information1-4
ID   information1
Aa   information1-1
Ba   information1-2
Ca   Homo sapiens
Da   information1-4
//

Result:

Code:

word = 'Homo sapiens'
with open(input_file, 'r') as input, open(output_file, 'w') as output:

    list_block = []
    str_block = ""

    for line in input:

        if not ("//" in line):
            str_block += line

        elif "//" in line:
            if word in str_block:
                list_block.append(str_block)
            str_block = ""

        output.write(str_block)

I have an input file which has blocks of information based on a 'double slash'. I want to extract only blocks containing 'Homo sapiens' from among several blocks. When I tried to parse the data with my code, I got an issue like 'Result'. Is there a way I can do with my code?


Solution

  • As your blocks are delimited by '//', it will be much easier to read the entirety of the file, and then split it according to this pattern. That will create the list of blocks you need, and after that the solution is pretty straightforward. Here is an example which produces the desired output.

    word = 'Homo sapiens'
    
    with open(input_file, 'r') as fi, open(output_file, 'w') as fo:
    
        for block in fi.read().split('//'):  # read file, split in blocks and iterate over them
    
            if word in block:
    
                fo.write(block)
                fo.write('//')