I ran into a problem where I have to go through proxy logs to see if users have visited a list of sites.
I wrote a small script to read all proxy logs, matching the visited host against the list:
for proxyfile in proxyfiles:
for line in proxyfile.readlines():
if line[4] in hosts_list:
print line
the hosts_file is large, we are talking about ~10000 hosts, and I noticed the searching took longer than expected.
I wrote a small test:
import random, time
test_list = [x for x in range(10000)]
test_dict = dict(zip(test_list, [True for x in range(10000)]))
def test(test_obj):
s_time = time.time()
for i in range(10000):
random.randint(0,10000) in test_obj
d_time = time.time() - s_time
return d_time
print "list:", test(test_list)
print "dict:",test(test_dict)
the result are the following:
list: 5.58524107933
dict: 0.195574045181
So, to my question. Is it possible to perform this search in a more convenient way? Creating a dictionary of a list seems like a hack, as I want to search for they key and not the value it contains.
"as I want to search for they key and not the value it contains" => then just use set