Parallelization of dependent nested loops

I aim to compute a simple N-body program on C++ and I am using OpenMP to speed things up with the computations. At some point, I have nested loops that look like that:

int N;
double* S          = new double[N];
double* Weight     = new double[N];
double* Coordinate = new double[N];
 ...

#pragma omp parallel for
for (int i = 0; i < N; ++i)
{
   for (int j = 0; j < i; ++j)
   {
       double K = Coordinate[i] - Coordinate[j];
       S[i]    += K*Weight[j];
       S[j]    -= K*Weight[i];
   }
}

The issue here is that I do not obtain exactly the same result when removing the #pragma ... I am guessing it has to do with the fact that the second loop is dependent on the integer i, but I don't see how to get past that issue

Solution

The problem is that there is a data race during updating S[i] and S[j]. Different threads may read from/write to the same element of the array at the same time, therefore it should be an atomic operation (you have to add #pragma omp atomic) to avoid data race and to ensure memory consistency:

for (int j = 0; j < i; ++j)
{
    double K = Coordinate[i] - Coordinate[j];
    #pragma omp atomic
    S[i]    += K*Weight[j];    
    #pragma omp atomic
    S[j]    -= K*Weight[i];
}