Search code examples
pythonordereddict

Please how do i form a dictionary from a file content that has header sections and body sections?


Given a File with the contents below :

******************
* Header title 1
* + trig apple
* + targ beans
* + trig grapes
* + targ berries

* Header title 2
* + trig beans
* + targ joke
* + trig help
* + targ me

The above pattern repeats with every header title having a uniq string. As i read the file i would like to create an ordered dict with keys as the Header titles and values as a list of the lines in the body section. So something like this :

d = {
    Header title 1: ['+ trig apple', '+ targ beans', '+ trig grapes', '+ targ berries' ],
    Header title 2: ['+ trig beans', '+ targ joke',  '+ trig grapes', '+ targ berries' ],
      .
      .
      .
    <key>: <value>
}

Please i am stuck! My current solution tries to iterate the file line by line to store the values in the list for each header, but i am seeing that it is storing all the body sections for all the headers into the list value for each header. Essentially my solution is not giving what i need.

I indicated above what i tried


Solution

  • The below code will create the file based on your sample input, then read it into an OrderedDict. This assumes headers start with * and records start with * +. It also presupposes that no records occur before the first header is set. You also likely want to clean up your text by removing new lines \n.

    from collections import OrderedDict
    
    file_content = """* Header title 1
    * + trig apple
    * + targ beans
    * + trig grapes
    * + targ berries
    
    * Header title 2
    * + trig beans
    * + targ joke
    * + trig help
    * + targ me"""
    
    # Write file
    with open("file.txt", "w+") as new_file:
        new_file.write(file_content)
    
    # Read file to ordered dict
    d = OrderedDict()
    with open("file.txt") as f:
        for line in f:
            if line.startswith("* +"):
                # Note this could be unbound, we assume Headers always start with '*'
                # and preceed any records with '* +'
                d[current_key].append(line.replace("* ", ""))
            elif line.startswith("*"):
                current_key = line.replace("* ", "")
                d[current_key] = []
    print(d["Header title 1\n"])
    print(d["Header title 2\n"])
    
    # ['+ trig apple\n', '+ targ beans\n', '+ trig grapes\n', '+ targ berries\n']
    # ['+ trig beans\n', '+ targ joke\n', '+ trig help\n', '+ targ me']