Say I have hundreds of text files like this example :
NAME
John Doe
DATE OF BIRTH
1992-02-16
BIO
THIS is
a PRETTY
long sentence
without ANY structure
HOBBIES
//..etc..
NAME, DATE OF BIRTH, BIO, and HOBBIES (and others) are always there, but text content and the number of lines between them can sometimes change.
I want to iterate through the file and store the string between each of these keys. For example, a variable called Name should contain the value stored between 'NAME' and 'DATE OF BIRTH'.
This is what I turned up with :
lines = f.readlines()
for line_number, line in enumerate(lines):
if "NAME" in line:
name = lines[line_number + 1] # In all files, Name is one line long.
elif "DATE OF BIRTH" in line:
date = lines[line_number + 2] # Date is also always two lines after
elif "BIO" in line:
for x in range(line_number + 1, line_number + 20): # Length of other data can be randomly bigger
if "HOBBIES" not in lines[x]:
bio += lines[x]
else:
break
elif "HOBBIES" in line:
#...
This works well enough, but I feel like instead of using many double loops, there must be a smarter and less hacky way to do it.
I'm looking for a general solution where NAME would store everything until DATE OF BIRTH, and BIO would store everything until HOBBIES, etc. With the intention of cleaning up and removing extra white lintes later.
Is it possible?
Edit : While I was reading through the answers, I realized I forgot a really significant detail, the keys will sometimes be repeated (in the same order).
That is, a single text file can contain more than one person. A list of persons should be created. The key Name signals the start of a new person.
I did it storing everything in a dictionary, see code below.
f = open("test.txt")
lines = f.readlines()
dict_text = {"NAME":[], "DATEOFBIRTH":[], "BIO":[]}
for line_number, line in enumerate(lines):
if not ("NAME" in line or "DATE OF BIRTH" in line or "BIO" in line):
text = line.replace("\n","")
dict_text[location].append(text)
else:
location = "".join((line.split()))