Search code examples
pythonsearchoptimizationpython-2.2

search times of "x in []" vs "x in {}"


I ran into a problem where I have to go through proxy logs to see if users have visited a list of sites.

I wrote a small script to read all proxy logs, matching the visited host against the list:

for proxyfile in proxyfiles:
    for line in proxyfile.readlines():
        if line[4] in hosts_list:
            print line

the hosts_file is large, we are talking about ~10000 hosts, and I noticed the searching took longer than expected.

I wrote a small test:

import random, time
test_list = [x for x in range(10000)]
test_dict = dict(zip(test_list, [True for x in range(10000)]))

def test(test_obj):
 s_time = time.time()
 for i in range(10000):
  random.randint(0,10000) in test_obj
 d_time = time.time() - s_time
 return d_time

print "list:", test(test_list)
print "dict:",test(test_dict)

the result are the following:

list: 5.58524107933
dict: 0.195574045181

So, to my question. Is it possible to perform this search in a more convenient way? Creating a dictionary of a list seems like a hack, as I want to search for they key and not the value it contains.


Solution

  • "as I want to search for they key and not the value it contains" => then just use set