Search code examples
c++debuggingprecisiontraveling-salesman

Wrong answer due to precision issues?


I am implementing Greedy Approach to TSP:

  1. Start from first node.

  2. Go to nearest node not visited yet. (If multiple, go to the one with the lowest index.)

  3. Don't forget to include distance from node 1 to last node visited.

However, my code gives the wrong answer. I implemented the same code in Python and the python code gives right answer.

In my problem, the nodes are coordinates on 2-D plane and the distance is the Euclidean Distance.

I even changed everything to long double because it's more precise.

In fact, if I reverse the order of the for loop to reverse the direction and add an additional if statement to handle ties (we want minimum index nearest node), it gives a very different answer.

Is this because of precision issues?

(Note: I have to print floor(ans))

INPUT: Link

EXPECTED OUTPUT: 1203406

ACTUAL OUTPUT: 1200403

#include <iostream>
#include <cmath>
#include <vector>
#include <cassert>
#include <functional>

using namespace std;

int main() {
    freopen("input.txt", "r", stdin);

    int n;
    cin >> n;

    vector<pair<long double, long double>> points(n);
    for (int i = 0; i < n; ++i) {
        int x;
        cin >> x;
        assert(x == i + 1);

        cin >> points[i].first >> points[i].second;
    }

    // Returns the squared Euclidean Distance
    function<long double(int, int)> dis = [&](int x, int y) {
        long double ans = (points[x].first - points[y].first) * (points[x].first - points[y].first);
        ans += (points[x].second - points[y].second) * (points[x].second - points[y].second);
        return ans;
    };

    long double ans = 0;
    int last = 0;
    int cnt = n - 1;
    vector<int> taken(n, 0);
    taken[0] = 1;

    while (cnt > 0) {
        pair<long double, int> mn = {1e18, 1e9};
        for (int i = 0; i < n; ++i) {
            if (!taken[i]) {
                mn = min(mn, {dis(i, last), i});
            }
        }

        int nex = mn.second;
        taken[nex] = 1;
        cnt--;
        ans += sqrt(mn.first);
        last = nex;
    }

    ans += sqrt(dis(0, last));
    cout << ans << '\n';
    return 0;
}

UPD: Python Code:

import math

file = open("input.txt", "r")

n = int(file.readline())

a = []
for i in range(n):
    data = file.readline().split(" ")
    a.append([float(data[1]), float(data[2])])

for c in a:
    print(c)


def dis(x, y):
    cur_ans = (a[x][0] - a[y][0]) * (a[x][0] - a[y][0])
    cur_ans += (a[x][1] - a[y][1]) * (a[x][1] - a[y][1])
    cur_ans = math.sqrt(cur_ans)
    return cur_ans


ans = 0.0
last = 0
cnt = n - 1
take = []
for i in range(n):
    take.append(0)
take[0] = 1

while cnt > 0:
    idx = -1
    cur_dis = 1e18
    for i in range(n):
        if take[i] == 0:
            if dis(i, last) < cur_dis:
                cur_dis = dis(i, last)
                idx = i

    assert(idx != -1)
    take[idx] = 1
    cnt -= 1
    ans += cur_dis
    last = idx

ans += dis(0, last)

print(ans)

file.close()

# 1203406

Solution

  • Yes, the difference is due to round-off error, with the C++ code producing the more accurate result because of your use of long double. If you change your C++ code, such that it uses the same precision as Python (IEEE-754, meaning double precision) you get the exact same round-off errors in both codes. Here is a demonstrator in Godbolt Compiler explorer, with your example boiled down to 4000 points: https://godbolt.org/z/rddrdT54n

    If I run the same code on the whole input file I get 1203406.5012708856 in C++ and in Python (Had to try this offline, because Godbolt understandibly killed the process).

    Note, that in theory your Python-Code and C++ code are not completely analogous, because std::min will compare tuples and pairs lexicographically. So if you ever have two distances exactly equal, the std::min call will choose the smaller of the two indices. Practically, this does not make a difference, though.

    Now I don't think you really can get rid off the rounding errors. There are a few tricks to minimize them.

    • using higher precision (long double) is one option. But this also makes your code slower, it's a tradeoff

    • Rescale your points, so that they are relative to the centroid of all points, and the unit reflects your problem (e.g. don't think in mm, miles, km or whatever, but rather in "variance of your data set"). You can't get rid of numerical cancellation in your calculation of the Euclidean distance, but if the relative distances are small compared to the absolute values of the coordinates, the cancellation is typically more severe. Here is a small demonstration:

      #include <iostream>
      #include <iomanip>
      
      int main() {
          std::cout 
              << std::setprecision(17)
              << (1000.0001 - 1000)/0.0001
              << std::endl
              << (1.0001 - 1)/0.0001
              << std::endl;
          return 0;
      }
      
      0.99999999974897946
      0.99999999999988987
      
    • Finally, there are some tricks and algorithms to better control the error accumulation in large sums (https://en.wikipedia.org/wiki/Pairwise_summation, https://en.wikipedia.org/wiki/Kahan_summation_algorithm)

    One final comment, a bit unrelated to your question: Use auto with lambdas, i.e.

    auto dis = [&](int x, int y) {
        // ...
    };
    

    C++ has many different kinds of callable objects (functions, function pointers, functors, lambdas, ...) and std::function is a useful wrapper to have one type representing all kinds of callables with the same signature. This comes at some computational overhead (runtime polymorphism, type erasure) and the compiler will have a hard time optimizing your code. So if you don't need the type erasing functionality of std::function, just store your lambda in a variable declared with auto.