I tried timing if x in
with timeit to test the difference between lists, tuples, dictionaries and sets.
Input:
arr = [45,42,2,18,23,1170,12,41,40,9,47,24,33,28,10,32,29,17,46,11,759,37,6,26,21,49,31,14,19,8,13,7,27,22,3,36,34,38,39,30,43,15,4,16,35,25,20,44,5,48]
d = {x: None for x in arr}
s = set(arr)
t = tuple(arr)
print(timeit.timeit(stmt="1170 in arr",globals = globals()))
print(timeit.timeit(stmt="1170 in d",globals = globals()))
print(timeit.timeit(stmt="1170 in s",globals = globals()))
print(timeit.timeit(stmt="1170 in t",globals = globals()))
output:
0.04749280004762113
0.03289399994537234
0.02820739999879152
0.05428230005782098
I read online that lookups for sets and dictionaries were O(1), while lookups for tuples and lists were O(n) or at best O(log n).
My questions are:
Edit: If I lookup an item that is not in the data structure, like 9999 (as suggested by Andrej Kesely) I observe dictionaries being consistently faster than sets, while lists are still slightly faster than tuples.
output:
0.2764877999434248
0.0286063000094146
0.0490725999698043
0.3088887999765575
Big-O notation is used to represent in general, how efficient an algorithm is. It does so purely from a theoretical perspective and in limits to operation flow, not actual time. It is possible, though unlikely for an optimized version of an algorithm to have worse rating using Big-O due to problem specific things. O(1) means the operation will always take a constant time. O(n) means the operation will take an amount of time correlating directly to the number of elements being operated over. O(log n) is generally the fastest algorithm achievable.
Dictionaries are constant time (O(1)) because they essentially give a function for getting an integer from an input value, and that integer is used basically as an array index. So a huge memory area is taken up, but if you generate the index from this hash function using a key, then the result is given immediately.
Set's, I've not looked into for sure, but thinking about them... Probably they are just dictionaries such that the values are boolean initialized to false or something.
Searching linearly through a list or tuple is O(n) because you go from left to right. Access into a list or tuple also is O(n), however access into an array when you know it's index is O(1) (notably, not all lists are the same but this is for an unoptimized linked list).
Everything I know off hand says tuple should be faster unless your functionally modifying it instead of mutating a list? I don't understand this, if someone comments then I'll update.