Search code examples
algorithmlinear-regressiongreatest-common-divisorreal-number

Algorithm to Find Approximate Greatest Common Denominator for Noisy Data?


I have a set of values that are multiples of some greatest common denominator, and I need to find it. If they were exact values I could simply use e.g. Euclid's algorithm to find the GCD. The problem is that they're measurements with non-negligible sampling error, not the exact values, and Euclid's algorithm doesn't work very well (it approximately computes a VERY small GCD that isn't really useful to me). While they can be thought of as points in 2D, the X values are unknown so I can't use linear regression algorithms (though by definition the X values would be integers, so I dunno if that fact might be useful in solving it in some way). Additionally, the true GCD is not necessarily an integer.

How might I be able to compute an approximate GCD?


Solution

  • The problem as posed is somewhat poorly defined. Because, obviously, when you allow approximate real numbers, then there are an infinite number of possible GCDs.

    Let's think of a measurement as a range (start, stop). That's equivalent to saying we measured (start + stop) / 2) with error (stop - start)/2, but turns out to be easy to work with. Here is an example in Python.

    def divisor_ranges (measurement):
        start, stop = measurement
        i = 1
        while stop / (i+1) < start / i:
            yield (start / i, stop / i)
            i += 1
        yield (0, stop / i)
    

    Given a single measurement, we get a series of ranges. where we might find a divisor. We're trying to intersect all of the series, for all of our measurements. So let's try to get just the part of this that intersects wth a given range.

    def divisor_ranges_in_range (measurement, range_measurement):
        start, stop = measurement
        range_start, range_stop = range_measurement
        i = math.ceil(start / range_stop)
        while True:
            if stop / i <= range_start:
                break # Too small
            elif stop / (i+1) < start / i:
                yield (max(range_start, start/i), min(range_stop, stop/i))
                i += 1
            else:
                yield (range_start, min(range_stop, stop/i))
                break # Just issued our final range.
    

    And now we can combine these recursively to find where common divisors are.

    def common_divisor_ranges (measurements):
        if len(measurements) == 1:
            yield from divisor_ranges(measurements[0])
        elif 1 < len(measurements):
            measurement = measurements[0]
            for range_measurement in common_divisor_ranges(measurements[1:]):
                yield from divisor_ranges_in_range(measurement, range_measurement)
    

    And now we can find a plausible answer quite easily.

    def gcd (measurements):
        for answer in common_divisor_ranges(measurements):
            return (answer[0] + answer[1])/2
    

    And as a demonstration:

    print(gcd([(98, 102), (50, 52), (129, 133)]))
    

    Which will quickly find 10.1 as our GCD. Corresponding to the actual values 10*10.1 = 101, 5*10.1 = 50.5, 13*10.1 = 131.3.

    I suspect that there are hidden requirements. For example the fact that your measurements are integers, suggests that you might want the underlying values to be integers. That would require additional logic to come up with candidate GCDs out of a range where common divisors can be found.