Search code examples
pythonalgorithmcomparison

Is there any algorithm that can be applied to this program?


I am doing an intern writing a program to do gene matching.

For example: File "A" contains some strings of gene type. (the original data is not sorted) rs17760268 rs10439884 rs4911642 rs157640 rs1958589 rs10886159 rs424232 ....

and file "B" contains 900 thousands of rs number like above (also not sorted)

My program now can get correct results, but I would like to make it more efficient.

Is there any algorithm that can be applied to this program?

BTW, I will try to make my program do multi-processing and see if it gets better performance.

pseudocode:
read File "A" by string, append to A[]
A[] = rs numbers from File "A"

read File "B" by string
for gene_B in file_B_reader:
    for gene_A in A:
        if gene_A == gene_B:
            #append to result[]

Solution

  • I don't think there's a need to sort anything first.

    • Process larger list B into a hashmap or hashset, O(n) amortized
    • Iterate over list A and remove from A if not in B, O(m)
    • return A

    Total: O(n + m)