Parse only selected records from empty-line separated file

I have a file with the following structure:

SE|text|Baz
SE|entity|Bla
SE|relation|Bla
SE|relation|Foo

SE|text|Bla
SE|entity|Foo

SE|text|Zoo
SE|relation|Bla
SE|relation|Baz

Records (i.e., blocks) are separated by an empty line. Each line in a block starts with a SE tag. text tag always occurs in the first line of each block.

I wonder how to properly extract only blocks with a relation tag, which is not necessarily present in each block. My attempt is pasted below:

from itertools import groupby
with open('test.txt') as f:
    for nonempty, group in groupby(f, bool):
        if nonempty:
            process_block() ## ?

Desired output is a json dump:

{
    "result": [
        {
            "text": "Baz", 
            "relation": ["Bla","Foo"]
        },
        {
            "text": "Zoo", 
            "relation": ["Bla","Baz"]
        }

    ]
}

Solution

I have a proposed solution in pure python that returns a block if it contains the value in any position. This could most likely be done more elegant in a proper framework like pandas.

from pprint import pprint

fname = 'ex.txt'

# extract blocks
with open(fname, 'r') as f:
    blocks = [[]]
    for line in f:
        if len(line) == 1:
            blocks.append([])
        else:
            blocks[-1] += [line.strip().split('|')]

# remove blocks that don't contain 'relation
blocks = [block for block in blocks
          if any('relation' == x[1] for x in block)]

pprint(blocks)
# [[['SE', 'text', 'Baz'],
#   ['SE', 'entity', 'Bla'],
#   ['SE', 'relation', 'Bla'],
#   ['SE', 'relation', 'Foo']],
#  [['SE', 'text', 'Zoo'], ['SE', 'relation', 'Bla'], ['SE', 'relation', 'Baz']]]


# To export to proper json format the following can be done
import pandas as pd
import json
results = []
for block in blocks:
    df = pd.DataFrame(block)
    json_dict = {}
    json_dict['text'] = list(df[2][df[1] == 'text'])
    json_dict['relation'] = list(df[2][df[1] == 'relation'])
    results.append(json_dict)
print(json.dumps(results))
# '[{"text": ["Baz"], "relation": ["Bla", "Foo"]}, {"text": ["Zoo"], "relation": ["Bla", "Baz"]}]'

Let's go through it

Read the file into a list and divide each block by a blank line and divide columns with the | character.
Go through each block in the list and sort out any that does not contain relation.
Print the output.