algorithm time-complexity complexity-theory

Time complexity of two separate nested for loops

An introductory textbook presents the following function as an example of polynomial complexity:

def intersect(L1, L2):
    """
    Assumes L1 and L2 are lists
    Returns a list without duplicates that is intersection of L1 and L2
    """
    
    # Part i - Build a list containing common elements
    tmp = []
    for e1 in L1:
        for e2 in L2:
            if e1 == e2:
                tmp.append(e1)
                break
             
    # Part ii - Build a list without duplicates
    result = []
    for e in tmp:
        if e not in result:
            result.append(e)
    
    return result

Questions

The author says that the running time for part (i) is order Θ(len(L1)*len(L2)), but this seems to imply that part (i) runs in Θ(len(L1)*len(L2)) in all cases, is this correct?

When I tried working through the example, I get different running times for the worst- and best-cases:

Worst-case: All of L1's elements match with the last element of L2 ==> Θ(len(L1)*len(L2)).
Best-case: All of L1's elements match with the first element of L2 ==> Θ(len(L1)).

...so I would have said part (i) is order Θ(len(L1)*len(L2)) in the worst-case.

The author then says that part (ii) is not Θ(len(tmp)) but rather Θ(len(tmp)*len(result)) since e not in result may require us to examine all elements in result and this would run in Θ(len(result)). But since len(result) and len(tmp) are bounded by the smaller of len(L1) and len(L2), Θ(len(tmp)*len(result)) can be ignored.

I understand that Θ(len(tmp)*len(result)) is an additive term and can therefore be ignored, but again I'm not sure why the author makes a blanket statement regarding part (ii) - are they saying part (ii) is Θ(len(tmp)*len(result)) in all cases?

Since the loop in part (ii) depends on the output of the loop in part (i), I figured result would be length 1 in the worst-case (as I've defined it above) and therefore part (ii)'s worst-case running time would be order Θ(len(tmp)), not Θ(len(tmp)*len(result)). This seems wrong, but I'm not sure how to characterise such loops.

I'm new to this topic so any guidance would be much appreciated.

EDIT: The same passage in an older edition of the book uses Big-O in place of Big-Theta.

Solution

The author says that the running time for part (i) is order Θ(len(L1)*len(L2)), but this seems to imply that part (i) runs in Θ(len(L1)*len(L2)) in all cases, is this correct?

No. The author uses Big Θ in the sense that they explain in section 11.2 of that same chapter, in the third edition (text differs from second edition) -- note the use of "worst case" which I highlighted:

Many computer scientists will abuse Big O notation by making statements like, "the complexity of f(x) is O(x²)." By this they mean that in the worst case f will take no more than O(x²) steps to run. [...]

[...] we will use Big Theta (Θ) when we are describing something that is both an upper and a lower bound on the asymptotic worst-case running time.

Then your following thought is also explained:

...so I would have said part (i) is order Θ(len(L1)*len(L2)) in the worst-case.

You'll now understand that this is exactly what the author wants to say.

Since the loop in part (ii) depends on the output of the loop in part (i), I figured result would be length 1 in the worst-case (as I've defined it above) and therefore part (ii)'s worst-case running time would be order Θ(len(tmp)), not Θ(len(tmp)*len(result)).

It is true that you defined a worst case earlier on, but there are other instances of worst case behaviour for part (i), that also perform worst in part (ii).

Take for instance the case where L1 has 𝑛 unique elements, and where L2 is a copy of L1.

In part (i), the inner loop will then iterate once the first time, it will iterate twice the second time,... etc. So in total the inner loop makes 1+2+3+4+...+𝑛 iterations. This corresponds to 𝑛(𝑛+1)/2 iterations, which is Θ(𝑛²), a worst case. tmp will be a copy of L1.

Now look at part (ii): the if block will be executed in each iteration since tmp has no duplicates. The in operation will scan all values in result. In the first iteration, result will be empty, so there is nothing to scan, but in the second it will have 1 element, then 2, so again we get 0+1+2+3...+(𝑛-1) steps inside the in operator which is Θ(𝑛²).