Search code examples
pythonstring-parsing

Parsing a string pattern (Python)


I have a file with following data:

<<row>>12|xyz|abc|2.34<</row>>
<<eof>>

The file may have several rows like this. I am trying to design a parser which will parse each row present in this file and return an array with all rows. What would be the best way of doing it? The code has to be written in python. Code should not take rows that do not start with <<row>> or should raise error.

=======> UPDATE <========

I just found that a particular <<row>> can span multiple lines. So my code and the code present below aren't working anymore. Can someone please suggest an efficient solution?

The data files can contain hundreds to several thousands of rows.


Solution

  • A simple way without regular expressions:

    output = []
    with open('input.txt', 'r') as f:
        for line in f:
            if line == '<<eof>>':
                break
            elif not line.startswith('<<row>>'):
                continue
            else:
                output.append(line.strip()[7:-8].split('|'))
    

    This uses every line starting with <<row>> until a line contains only <<eof>>