Search code examples
pythonalgorithmcartesian-coordinates

Why GCD is needed in this algorithm finding all groups of three points are collinear in Cartesian coordinate plane?


I was trying to solve this coding challenge: "Given an array of pairs, each pair (x, y), which are both integer, denotes coordinate of a point in Cartesian coordinate plane, find many groups of three points are collinear."

It turns out this below is the correct algorithm:

def gcd(x, y):
    if y == 0:
        return x
    else:
        return gcd(y, x % y)

def how_many_3point_in_line( points ):
    ans = 0
    n = len( points )
    for i in range( n - 2 ):
        slopes = defaultdict( int )
        for j in range( i + 1, n ):
            dy, dx = points[i][1] - points[j][1], points[i][0] - points[j][0]
            gcd_num = gcd( dx, dy )
            slopes[( dy // gcd_num, dx // gcd_num )] += 1
        for _, occ in slopes.items():
            ans += math.comb( occ, 2 )
    return ans

Apparently the gcd is used to represent slope, what issue does it address?


Solution

  • A clarification:

    Many thanks for fellow no comment's teaching, indeed the GCD in OP's posted solution is needed, and my original answer is wrong.

    Taking points = [(0, 0), (999999997, 999999998), (999999998, 999999999)] as an example, the difference between 999999997/999999998 and 999999998/999999999 appears only after so many decimal digits that floating point number does not include, leading to 999999997/999999998 == 999999998/999999999 being evaluated to be True.

    I also found this answer that nicely discussed about related topic.

    This answer has been marked as accepted so I cannot delete it; I'll keep it here as a counterexample.


    Original answer ( wrong ):

    I guess the author of your code is trying to address inaccuracy of floating point representation.

    But that worry is actually unnecessary given that both x and y are integer, e.g., 3.0/9.0 == 9.0/27.0 will give you True, I recently studied on related topic and can hence confirm this. The only case where you (may or may not) get trouble is when either the denominator or numerator is floating point number, which does not apply in your question.

    So, you can modify the solution to simply use numeric slope, but you need to specially handle case where dx is 0, otherwise you get “divided by zero” exception.

    def how_many_3point_in_line( points ):
        ans = 0
        n = len( points )
        for i in range( n - 2 ):
            slopes = defaultdict( int )
            numVertical = 0
            for j in range( i + 1, n ):
                dy, dx = points[i][1] - points[j][1], points[i][0] - points[j][0]
                if dx != 0:
                    slopes[dy / dx] += 1
                else:
                    numVertical += 1
            for _, occ in slopes.items():
                ans += math.comb( occ, 2 )
            ans += math.comb( numVertical, 2 )
        return ans