An introductory textbook presents the following function as an example of polynomial complexity:
def intersect(L1, L2):
"""
Assumes L1 and L2 are lists
Returns a list without duplicates that is intersection of L1 and L2
"""
# Part i - Build a list containing common elements
tmp = []
for e1 in L1:
for e2 in L2:
if e1 == e2:
tmp.append(e1)
break
# Part ii - Build a list without duplicates
result = []
for e in tmp:
if e not in result:
result.append(e)
return result
Questions
Θ(len(L1)*len(L2))
, but this seems to imply that part (i) runs in Θ(len(L1)*len(L2))
in all cases, is this correct?When I tried working through the example, I get different running times for the worst- and best-cases:
Θ(len(L1)*len(L2))
.Θ(len(L1))
....so I would have said part (i) is order Θ(len(L1)*len(L2))
in the worst-case.
Θ(len(tmp))
but rather Θ(len(tmp)*len(result))
since e not in result
may require us to examine all elements in result
and this would run in Θ(len(result))
. But since len(result)
and len(tmp)
are bounded by the smaller of len(L1)
and len(L2)
, Θ(len(tmp)*len(result))
can be ignored.I understand that Θ(len(tmp)*len(result))
is an additive term and can therefore be ignored, but again I'm not sure why the author makes a blanket statement regarding part (ii) - are they saying part (ii) is Θ(len(tmp)*len(result))
in all cases?
Since the loop in part (ii) depends on the output of the loop in part (i), I figured result
would be length 1 in the worst-case (as I've defined it above) and therefore part (ii)'s worst-case running time would be order Θ(len(tmp))
, not Θ(len(tmp)*len(result))
. This seems wrong, but I'm not sure how to characterise such loops.
I'm new to this topic so any guidance would be much appreciated.
EDIT: The same passage in an older edition of the book uses Big-O in place of Big-Theta.
The author says that the running time for part (i) is order
Θ(len(L1)*len(L2))
, but this seems to imply that part (i) runs inΘ(len(L1)*len(L2))
in all cases, is this correct?
No. The author uses Big Θ in the sense that they explain in section 11.2 of that same chapter, in the third edition (text differs from second edition) -- note the use of "worst case" which I highlighted:
Many computer scientists will abuse Big O notation by making statements like, "the complexity of f(x) is O(x²)." By this they mean that in the worst case f will take no more than O(x²) steps to run. [...]
[...] we will use Big Theta (Θ) when we are describing something that is both an upper and a lower bound on the asymptotic worst-case running time.
Then your following thought is also explained:
...so I would have said part (i) is order
Θ(len(L1)*len(L2))
in the worst-case.
You'll now understand that this is exactly what the author wants to say.
Since the loop in part (ii) depends on the output of the loop in part (i), I figured
result
would be length 1 in the worst-case (as I've defined it above) and therefore part (ii)'s worst-case running time would be orderΘ(len(tmp))
, notΘ(len(tmp)*len(result))
.
It is true that you defined a worst case earlier on, but there are other instances of worst case behaviour for part (i), that also perform worst in part (ii).
Take for instance the case where L1
has 𝑛 unique elements, and where L2
is a copy of L1
.
In part (i), the inner loop will then iterate once the first time, it will iterate twice the second time,... etc. So in total the inner loop makes 1+2+3+4+...+𝑛 iterations. This corresponds to 𝑛(𝑛+1)/2 iterations, which is Θ(𝑛²), a worst case. tmp
will be a copy of L1
.
Now look at part (ii): the if
block will be executed in each iteration since tmp
has no duplicates. The in
operation will scan all values in result
. In the first iteration, result
will be empty, so there is nothing to scan, but in the second it will have 1 element, then 2, so again we get 0+1+2+3...+(𝑛-1) steps inside the in
operator which is Θ(𝑛²).