Precision different while calling from Raku nativecall

I was looking to write linspace function of NumPy.

As the loops are faster in compiled code, tried writing in C and calling from Raku.

//  C code
#include <stdio.h> 
#ifdef _WIN32 
#define DLLEXPORT __declspec(dllexport)
#else 
#define DLLEXPORT extern // if c++ code, requires extern "C"
#endif

DLLEXPORT void c_linspace(double start, double step, int num, double* vals) {
    for (int i = 0; i < num; i++)
{
    vals[i] = start;
    start += step;
}
}

// Raku code
sub c_linspace(num64, num64, int32, CArray[num64]) 
    is native( MYDYN) { * };

sub raku_linspace($start, $end, $num, :$endpoint = True, :$retstep = False) {
    my $step = $endpoint ?? ($end - $start)/($num - 1) !!  ($end-$start)/($num);
    my $vals = CArray[num64].allocate($num);
    c_linspace($start.Num, $step.Num, $num.Int, $vals);
    $retstep ?? ($vals.list, $step) !! $vals.list
}

Works OK, but checking the precision of the results for example:

Raku gives:

say raku_linspace(1,3,300000)[200]   # gives 1.0013333377777869

While NumPy gives:

import numpy as np
arr = np.linspace(1,3,300000)[200]
print(arr) # gives 1.0013333377777927

Why this difference in precision? My expectation was double was supposed to maintain precision to 15 decimal digits.

The docs mention double is mapped to num64.

Also mentions there are Rakudo specific types where long, longlong is mentioned. So what does long double maps to in Raku. It does not appear to be num64.

System information:
Windows 10 64 bit
gcc 13.2.0

It looks like the problem is directly in C due to floating point errors rather than calling via nativecall. Because running C gives what Raku is giving.

Yet, is there mechanism in nativecall to handle long double or long long preferably with an example?

Also what could be done to make this function with the same precision as NumPy?

Solution

I don't know if we should un-tag this question, edit it, or another thing, but anyway: this is not a NativeCall question, nor a Raku question, it's a C question, and the "real" question here is:

Why is my c_linspace generating a different result from numpy.linspace?

The answer to this question is: numpy's linspace uses a different calculation, probably to use SMP instructions. It does (simplifying a bit):

    y = _nx.arange(0, num, dtype=dt).reshape((-1,) + (1,) * ndim(delta))
    step = delta / div
    y *= step
    y += start

The C equivalent would be:

void c_linspace(double start, double step, int num, double* vals) {
  for (int i = 0; i < num; i++)
    vals[i] = (double) i;
  for (int i = 0; i < num; i++)
    vals[i] *= step;
  for (int i = 0; i < num; i++)
    vals[i] += start;
}

If you use this implementation, your Raku code will return the same result of 1.0013333377777927 as the numpy version.

But, as stated in the comments, even

void c_linspace(double start, double step, int num, double* vals) {
  for (int i = 0; i < num; i++)
    vals[i] = i * step + start;
}

would suffice...

Answering your other question, i.e.,

Yet, is there mechanism in nativecall to handle long double or long long preferably with an example?

long long is supported in rakudo, the type is called longlong or ulonglong in its unsigned variant. The long double type is not supported (yet?) but it should be easy (although not trivial) to add support for it in https://github.com/rakudo/rakudo/blob/main/lib/NativeCall.rakumod (the not trivial part is to complete the tests etc).