Search code examples
for-loopopenclnoise

OpenCL for-loop doing strange things


I'm currently implementing terrain generation in OpenCL using layered octaves of noise and I've stumbled upon this problem:

float multinoise2d(float2 position, float scale, int octaves, float persistence)
{
    float result = 0.0f;
    float sample = 0.0f;
    float coefficient = 1.0f;

    for(int i = 0; i < octaves; i++){
        // get a sample of a simple signed perlin noise
        sample = sgnoise2d(position/scale);

        if(i > 0){
            // Here is the problem:

            // Implementation A, this works correctly.
            coefficient = pown(persistence, i);

            // Implementation B, using this only the first
            // noise octave is visible in the terrain.
            coefficient = persistence;
            persistence = persistence*persistence;
        }

        result += coefficient * sample;
        scale /= 2.0f;
    }
    return result;
}

Does OpenCL parallelize for-loops, leading to synchronization issues here or am I missing something else?

Any help is appreciated!


Solution

  • the problem of your code is with the lines

    coefficient = persistence;
    persistence = persistence*persistence;
    

    It should be changed to

    coefficient = coefficient *persistence;
    

    otherwise on every iteration

    the first coeficient grows by just persistence

    pow(persistence, 1) ; pow(persistence, 2); pow(persistence, 3) ....
    

    However the second implementation goes

    pow(persistence, 1); pow(persistence, 2); pow(persistence, 4); pow(persistence, 8) ......
    

    soon "persistence" will run above the limit for float and you will get zeros (or undefined behavior) in your answer.

    EDIT Two more things

    1. Accumulation (implementation 2) is not a good idea, specially with real numbers and with algorithms that require accuracy. You might be losing a small fraction of you information every time you accumulate on "persistence" (e.g due to rounding). Prefer direct calculation (1st implementation) over accumulation whenever you can. (plus if this was Serial the 2nd implementation will be readily parallelizable.)
    2. If you are working with AMD OpenCL pay attention to the pow() functions. I have had problems with those on multiple machines on multiple occasions. The functions seem to hang sometimes for no reason. Just FYI.