Search code examples
pythonstringtext

Reading Large Text Content Into Dictionary


I have a large text file with daily entries:

Date - 01031991
Location- Worcester, MA
Status - very long sentence that
        continues over the next few
        lines like so.
Author- Security 87

Date- 01071991
Location - Fort-Devens, MA
Status: another long%$@ sent%$#ence 
        with space and all typ&^%$es
        of characters.
Author - Security 92

I am using Python to turn this text file into an Excel workbook. I expect to end up with a workbook containing the columns and values in this text file. I have written a script as follow:

myfile = open(txtfile, 'r')
dictionary = {}

for line in myfile:
    
    k, v = line.strip().split("-", maxsplit=1)
    dictionary[k] = v
    
myfile.close()

For now, I can't get the entire sentence in "Status" because the end of line is followed by a space, and next line, then a lot of spaces before the next word. As in, "very long sentence that \n continues over the next few \n ...".

How do I obtain the entire sentence in to my dictionary? Right now, I only get:

print(dictionary)
{'Date ': ' 01031991', 'Location': ' Worcester, MA', 'Status ': ' very long sentence that'}

Solution

  • A non-regex approach:

    # read text from file
    path = # file name
    with open(path, 'r') as fd:
       text = fd.read()
    
    # process text line by line
    data = {}
    last_key = ''
    for line in text.split('\n'):
        if line.startswith(' '):
            data[last_key] += ' ' + line.strip(' -:\n\t')
        else:
            key, _, content = line.partition(' ')
            data[key] = content.strip().lstrip('-:')
            last_key = key
    
    # check result
    for k, v in data.items():
        print(k, v)