Search code examples
pythonpython-3.xstringdictionarytext

Python: Extract values from a text file to create nested dictionary


I have a text file that is very unorganized with multiple-sub-index objects, something like this:

1:
Name of Object 1

Sub-index 0: 
Scale: Q0
Unit: Percent
Description: Object 1 does this

2:
Object 2 yo

Sub-index 0: 
Scale: Q0
Unit: Percent
Description: Something important

Sub-index 1: 
Scale: 0.125
Unit: Percent
Description: Object 2 does that

I want to extract these objects' name, scale and description and make them into a dictionary. Something like this:

ObjectDict = {
    1: ['Name of Object 1', 'Q0', 'Object 1 does this'],
    2: {
        0: ['Object 2 yo', 'Q0', 'Something important'],
        1: ['Object 2 yo', '0.125', 'Object 2 does that']
    }
}

I was able to extract the dictionary keys by doing this:

for line in textfile:
    a = line.replace(':', '')
    if b.isnumeric():
        # this is 1 key

I can "probably" extract Scale and Description value of an object by doing:

if 'Scale' in line: # Store the value
if 'Description' in line: # Store the value

However, this would only work if the object only has 1 sub-index. For multiple-sub-index objects like Object 2, I could not figure out how to do them yet. Is there a nice way to do this in Python 3.7? Thanks!

EDIT: The dictionary format I chose above is just an example. Any other formatted dictionary is okay. I just want to extract necessary data from an unorganized file and store it more properly so other files can access them.


Solution

  • If you use dictionaries for every object in txt file you can loop through lines of txt file and use some of python builtin functions like readlines() and startswith() to do what you want.

    f = open('sample.txt')
    lines = f.readlines()
    d = {}
    for i,line in enumerate(lines):
        if line[:-2].isnumeric():
            ind =  line[:-2]
            name = lines[i+1].replace('\n', '')
            if not ind in d:
                d[ind] = {}
    
        if line.startswith('Sub-index'):
            sub_ind = line.split()[-1].split(':')[0]
            if not sub_ind in d[ind]:
                d[ind][sub_ind] = []
                d[ind][sub_ind].append(name)
    
        if line.startswith('Scale'):
            scale = line.split()[-1]
            d[ind][sub_ind].append(scale)
    
        if line.startswith('Description'):
            desc = line.split(': ')[-1].replace('\n', '')
            d[ind][sub_ind].append(desc)
    

    Output:

    {
        '1': {
            '0': ['Name of Object 1', 'Q0', 'Object 1 does this']
            },
        '2': {
            '0': ['Object 2 yo', 'Q0', 'Something important'],
            '1': ['Object 2 yo', '0.125', 'Object 2 does that']
            }
    }