Search code examples
parallel-processingarrayfire

Demoting to float2 when using seq with double2 arrays in ArrayFire


I'm using the following test code exploiting the ArrayFire library.

void test_seq(const array& input, array& output, const int N)
{
    array test      = seq(0,N-1);                                         
    output          = input;
}

(for the moment `array test` has no role)

double2* test_CPU; test_CPU=(double2*)malloc(10*sizeof(double2));       
for (int k=0; k<10; k++) { test_CPU[k].x=2.; test_CPU[k].y=1.; }
array test_GPU(10, test_CPU);
array test_GPU_output = constant(0.,10, c64);
test_seq(test_GPU,test_GPU_output,10);
print(test_GPU_output);
try {
    double2 *CPU_test = test_GPU_output.host<double2>();
    printf("%f %f\n",CPU_test[0].x,CPU_test[0].y);
} catch (af::exception& e) {
fprintf(stderr, "%s\n", e.what()); 
}

and everything compiles and runs correctly.

However, then I change the above function to

void test_seq(const array& input, array& output, const int N)
{
    array test      = seq(0,N-1);                                         
    output          = input * test;
}

I receive the following runtime error message

src/gena/gtypes.cpp:112: error: requested cuDoubleComplex from array of type cuComplex

If, on the other side, I change the line

double2 *CPU_test = test_GPU_output.host<double2>();

to

float2 *CPU_test = test_GPU_output.host<float2>();

everything runs fine again. It seems there is a demotion to float2 connected with the use of seq. The above problem does not disappear if I use something like seq(0,N-1,f64) (I even do not know if it is allowed by ArrayFire).

How can I keep double2 processing and avoid demoting to float2?


Solution

  • When converting seq to array, it is stored as single precision (float).

    Currently in arrayfire, the rule for an operation involving two arrays from different precision is to choose the lower precision. This is the reason input * test is getting converted from double precision to single precision (and hence float2).

    The solution for now is to add the line below the generation of test.

    test = test.as(f64);
    

    It will add very little overhead, because the array is not generated until necessary.