I am new to python. I am stuck at my homework. I am trying to split a 10,000 lines of text file into multiple files based on a list of keywords.
input.txt looks something like this:
Name: Apple
Type: Fruits
Description:...
Name: Orange
Type: Fruits
Description:...
Name: Yellow
Type: Colour
Description:...
Name: Apple
Type: Fruits
Description:...
Name: Orange
Type: Fruits
Description:...
Name: Yellow
Type: Colour
Description:...
Keywords:
Apple
Orange
Yellow
Expected output files :
Apple.txt
Type: Fruits
Description:
0range.txt
Type: Fruits
Description:
Yellow.txt
Type: Colour
Description:
But my current codes only able to split if the key is 'Apple'. I am not sure how to modify it to a range of keywords.
key = ['Apple']
outfile = None
fno = 0
lno = 0
with open('input.txt') as infile:
while line := infile.readline():
lno += 1
if outfile is None:
fno += 1
outfile = open(f'{fno}.txt', 'w')
outfile.write(line)
if key in line:
print(f'"{key}" found in line {lno}')
outfile.close()
outfile = None
if outfile:
outfile.close()
Edit: It should print the first record for each keyword.
Here is a somewhat more idiomatic version of your code. It does not hardcode a list of keywords; it simply picks up whatever comes after Name:
seen = set()
outfile = None
with open('input.txt') as infile:
for line in infile:
if line.startswith(' Name: '):
keyword = line[len(' Name: '):-1]
if keyword not in seen:
outfile = open(f'{keyword}.txt', 'w')
seen.add(keyword)
if outfile is not None:
if line.strip() == '':
outfile.close()
outfile = None
else:
outfile.write(line)
if outfile is not None:
outfile.close()
You were never doing anything useful with lno
but if you wanted it for some reason, the idiomatic way to get line numbers is
for lno, line in enumerate(infile, start=1):
Your sample input.txt
shows a space at the beginning of each line. If that was incorrectly transcribed, obviously adapt accordingly.