Search code examples
pythonperformancesetfastq

faster membership testing in python than set()


I have to check presence of millions of elements (20-30 letters str) in the list containing 10-100k of those elements. Is there faster way of doing that in python than set() ?

import sys
#load ids
ids = set( x.strip() for x in open(idfile) )

for line in sys.stdin:
    id=line.strip()
    if id in ids:
        #print fastq
        print id
        #update ids
        ids.remove( id )

Solution

  • set is as fast as it gets.

    However, if you rewrite your code to create the set once, and not change it, you can use the frozenset built-in type. It's exactly the same except immutable.

    If you're still having speed problems, you need to speed your program up in other ways, such as by using PyPy instead of cPython.