I have a list of values and I need to search through an ~1 GB text file for those values and then pull a corresponding value from the text file in the same row. Is there a more efficient method than using nested for loops where I would be going through my list of values and then going into my nested loop where I would be iterating through the text file to find the corresponding matches?
for name in name_list:
with open("test.txt") as infile:
for line in infile:
currentline = line.split("|")
if name == currentline[0]:
print(currentline[2])
This is exactly what a dictionary is for. Assuming parts[0]
is unique:
dic = {}
with open("test.txt") as infile:
for line in infile:
parts = line.split("|")
dic[parts[0]] = parts[2]
for name in name_list:
if name in dic:
print(dic[name])
if it isn't unique, you'll have to keep a list for each dict entry:
dic = {}
with open("test.txt") as infile:
for line in infile:
parts = line.split("|")
if parts[0] in dic:
dic[parts[0]].append(parts[2])
else:
dic[parts[0]] = [parts[2]]
for name in name_list:
if name in dic:
for matching in dic[name]:
print(matching)
this drastically reduces the runtime: assuming your file has n
entries and your name_list
has m
entries, you had a complexity of O(n * m)
before - now you have O(n + m), since hash map access is constant time - you don't perform a linear search anymore!
In practice, the speedup will be even larger since your previous code reread the file within the O(m) loop.
Furthermore, as suggested by @KellyBundy, you can initialize the dict for the entries of name_list
to only collect values for names you're interested in:
dic = {name: [] for name in name_list}
with open("test.txt") as infile:
for line in infile:
parts = line.split("|")
if parts[0] in dic:
dic[parts[0]].append(parts[2])
for matching_list in dic.values():
for matching in matching_list:
print(matching)