Search code examples
pythonperformancefor-loopmemorynested-loops

Is there a more efficient method than nested loops to search through a list and then find corresponding matches in a text file?


I have a list of values and I need to search through an ~1 GB text file for those values and then pull a corresponding value from the text file in the same row. Is there a more efficient method than using nested for loops where I would be going through my list of values and then going into my nested loop where I would be iterating through the text file to find the corresponding matches?

    for name in name_list:
        with open("test.txt") as infile:
            for line in infile:
                currentline = line.split("|")
                if name == currentline[0]:
                    print(currentline[2])

Solution

  • This is exactly what a dictionary is for. Assuming parts[0] is unique:

    dic = {}
    with open("test.txt") as infile:
        for line in infile:
            parts = line.split("|")
            dic[parts[0]] = parts[2]
    for name in name_list:
        if name in dic:
            print(dic[name])
    

    if it isn't unique, you'll have to keep a list for each dict entry:

    dic = {}
    with open("test.txt") as infile:
        for line in infile:
            parts = line.split("|")
            if parts[0] in dic:
                dic[parts[0]].append(parts[2])
            else:
                dic[parts[0]] = [parts[2]]
    for name in name_list:
        if name in dic:
            for matching in dic[name]:
                print(matching)
    

    this drastically reduces the runtime: assuming your file has n entries and your name_list has m entries, you had a complexity of O(n * m) before - now you have O(n + m), since hash map access is constant time - you don't perform a linear search anymore!

    In practice, the speedup will be even larger since your previous code reread the file within the O(m) loop.

    Furthermore, as suggested by @KellyBundy, you can initialize the dict for the entries of name_list to only collect values for names you're interested in:

    dic = {name: [] for name in name_list}
    with open("test.txt") as infile:
        for line in infile:
            parts = line.split("|")
            if parts[0] in dic:
                dic[parts[0]].append(parts[2])
    for matching_list in dic.values():
        for matching in matching_list:
            print(matching)