Search code examples
pythonpython-3.xalgorithmcomparison

Second or last but one


Walking through the code interviews of IT international companies I run into interesting problem.

How many comparisons do we have to make to figure out what element out of six is the second smallest or the second largest.

None of these six elements have the same value.

  • we have main function with six arguments (v1, ..., v6)

     def solve(v1, v2, v3, v4, v5, v6):
         # the worst case -> 9 comparisons
         if isLarger(v1, v2):
             v1, v2 = v2, v1
         if isLarger(v1, v3):
             v1, v3 = v3, v1
         if isLarger(v1, v4):
             v1, v4 = v4, v1
         if isLarger(v1, v5):
             v1, v5 = v5, v1
         if isLarger(v1, v6):
             v1, v6 = v6, v1
         if isLarger(v2, v3):
             v2, v3 = v3, v2
         if isLarger(v2, v4):
             v2, v4 = v4, v2
         if isLarger(v2, v5):
             v2, v5 = v5, v2
         if isLarger(v2, v6):
             v2, v6 = v6, v2
         print(f"#comparisons = {CntComparisons}")
         return v2
    

    which returns the second smallest or the second largest value.

  • Determine this value by comparison (i.e. it cannot use indexing by that value).

  • For pairwise comparison we can use only the below function

    CntComparisons = 0
    def isLarger(v1, v2):
        global CntComparisons
        CntComparisons += 1
        return v1 > v2
    
  • Values are compared only by calling the comparison function isLarger(v1, v2).

The goal is to find an algorithm that requires (even in the worst case) as few comparisons as possible!

Any ideas or hint how to solve this?


Solution

  • A trivial, but not optimal way is using the first two passes of bubble sort: this will swap pairs of variables and so bubble the 2 greatest values to the "right" and assign them to v5 and v6, and so v5 will be returned as correct answer. This requires 9 comparisons: 5 in the first pass, 4 in the second. So we have an upper bound of 9 comparisons.

    A lower bound is 5, since that is the number of comparisons needed to find either the minimum or the maximum, and that will be needed to be sure a value is the second least or second greatest.

    Here is an idea:

    • Perform 2 to 3 comparisons to sort v1, v2, v3 through swaps from least to greatest. We then know that v2 is not the least nor the greatest.

    • Perform 3 comparisons between v2 and the last 3 values (v3, v4 and v5).

    • If these comparisons all give the same boolean result, it means that v2 is indeed the answer.

    • If among those boolean results there is only one True (i.e. v2 is only greater than one of them), then let vx be the one among v3, v4 or v5 for which v2 is greater than vx. Return the greatest of v1 and vx: this needs another, 7th comparison.

    • If among those boolean results there is only one False (i.e. v2 is greater than two of them), then let vx be the one among v3, v4 or v5 for which v2 is not greater than vx. Return the least of v3 and vx: this needs another, 7th comparison.

    So in the worst case we get the result with 7 comparisons. The best case needs 5 comparisons (2 for the sorting step, and c4, c5 and c6).

    Here is an implementation of this idea:

    def solve(v1, v2, v3, v4, v5, v6):
        # Sort (v1, v2, v3)
        if isLarger(v1, v2):
            v1, v2 = v2, v1
        if isLarger(v2, v3):
            v2, v3 = v3, v2
            if isLarger(v1, v2):
                v1, v2 = v2, v1
        # v2 is now certainly not the greatest nor the least
        # Check how it compares with v4, v5 and v6
        c4 = isLarger(v2, v4)
        c5 = isLarger(v2, v5)
        c6 = isLarger(v2, v6)
        if c4 == c5 == c6:  # All true or all false
            return v2
        if c4 ^ c5 ^ c6:  # Only one of them is true
            vx = v4 if c4 else (v5 if c5 else v6)
            return v1 if isLarger(v1, vx) else vx
        else:  # Only one of them is false
            vx = v4 if not c4 else (v5 if not c5 else v6)
            return vx if isLarger(v3, vx) else v3
    

    I ran this on all permutations of [1,2,3,4,5,6] and these are the results:

    • 5 comparisons needed for 96 permuations
    • 6 comparisons needed for 336 permutations
    • 7 comparisons needed for 288 permutations

    On average: 6.266667 comparisons