Search code examples
c++mathraytracingz-order-curve

Morton curve for non cubic areas that are a power of two


While optimizing a ray tracer I was trying to improve data locality for the intersection datastructure using a Morton space filling curve, except that my 3D space is not cubic (eg. 512x512x256). All sides are a power of two, but not all sides are the same length.

I have been unable to find any examples for non square Morton curves where the sides are a power of two. If it matters I can guarantee that the x/y axis are the same size with only the z axis being a different length.

enter image description here
Note how the width is 2x height, but it could also be 3x or 4x or any other. I have been unable to find a way how to do this.

Ideally the solution would be fast as the Morton code has to be calculated a lot. So my question is: How do I generate a space filling morton curve for non-cubic spaces? This is specifically for the GPU (Cuda).

The conditions on the dimensions are:
x, y, z are a power of two
x == y
x, y >= z
Or if easier
x, y > z


Solution

  • It would probably break for, say, nz=11, but for half of the size of the XY square it seems to work for me

    #include <cstdint>
    #include <iostream>
    
    static inline uint32_t spread(uint32_t x)
    {
        x = (x | (x << 10)) & 0x000F801F;
        x = (x | (x << 4)) & 0x00E181C3;
        x = (x | (x << 2)) & 0x03248649;
        x = (x | (x << 2)) & 0x09249249;
    
        return x;
    }
    
    static inline uint32_t morton(const uint32_t x, const uint32_t y, const uint32_t z)
    {
        return spread(x) << 0 | spread(y) << 1 | spread(z) << 2;
    }
    
    
    auto main() -> int {
        int nx = 32;
        int ny = 32;
        int nz = 16;
    
        for (int iz = 0; iz != nz; ++iz)
        {
            for (int iy = 0; iy != ny; ++iy)
            {
                for (int ix = 0; ix != nx; ++ix)
                {
                    auto m = morton(ix, iy, iz);
    
                    std::cout << m << '\n';
                }
            }
        }
    
        return 0;
    }
    

    UPDATE

    How to make Morton code work for, say, 256x256x64 (8bit*8bit*6bit): you have to spread X and Y non-equidistantly, taking into account number of bits in Z. Basically, for cube you spread evenly: each bit at position 0, 3, 6, 9, 12, 15, 18, 21, 24, leaving space for other two bits from orthogonal axes.

    So there is equidistant spread for a cube. But for the case when you have only 6 bits from Z to insert, you have to have 6 distances of 3, but no Z bits for last gap, thus last gap for X and Y spread should be only 1 bit wide. Thus, non-equidistant spread in X and Y.

    Something along the line: if Nx=Ny are number of bits in XY plane, and Nz!=Nx or Ny is number of bits along Z axis, spread gap should be 2 bits for Nz bits and gap of 1 bit for what is left. So two spread routines - one for X&Y with non-equidistant spread which now depends on Nz, and existing spread function for Z axis.

    Ok, here is a working version, seems to be doing right thing

    #include <cstdint>
    #include <iostream>
    
    #define func auto
    
    func spreadZ(uint32_t v) -> uint32_t { // 2bit gap spread
        v = (v | (v << 10)) & 0x000F801F;
        v = (v | (v << 4)) & 0x00E181C3;
        v = (v | (v << 2)) & 0x03248649;
        v = (v | (v << 2)) & 0x09249249;
    
        return v;
    }
    
    func spreadXY(const uint32_t v, const uint32_t bitsZ) -> uint32_t {
        uint32_t mask_z = (1U << bitsZ) - 1U; // to mask bits which are going to have 2bit gap
    
        uint32_t lo{ v & mask_z }; // lower part of the value where there are Z bits
        lo = spreadZ(lo);          // 2bit gap spread
    
        uint32_t hi = v >> bitsZ; // high part of the value, 1bit gap
    
        // 1bit gap spread
        hi = (hi ^ (hi << 8)) & 0x00ff00ffU;
        hi = (hi ^ (hi << 4)) & 0x0f0f0f0fU;
        hi = (hi ^ (hi << 2)) & 0x33333333U;
        hi = (hi ^ (hi << 1)) & 0x55555555U;
    
        return lo + (hi << 3*bitsZ); // combine them all together
    }
    
    func morton(const uint32_t x, const uint32_t y, const uint32_t z, const uint32_t bitsZ) -> uint32_t {
        return spreadXY(x, bitsZ) << 0 | spreadXY(y, bitsZ) << 1 | spreadZ(z) << 2;
    }
    
    func ispowerof2(const uint32_t n) -> bool {
        return n && (!(n & (n - 1u)));
    }
    
    func bit_pos(const uint32_t n) -> uint32_t {
        if (!ispowerof2(n))
            throw -1;
    
        uint32_t i{ 1u }, pos{ 1u };
    
        while (!(i & n)) { // Iterate through bits of n till we find a set bit, i&n will be non-zero only when 'i' and 'n' have a same bit         
            i = i << 1; // unset current bit and set the next bit in 'i' 
    
            ++pos; // increment position 
        }
    
        return pos;
    }
    
    func main() -> int {
        int nx = 256;
        int ny = 256;
    
        int nz = 256; //256...128...64...32...16...8...4...2...1 all works
        int bitsZ = bit_pos(nz) - 1; // should be doing try/catch
    
        for (int iz = 0; iz != nz; ++iz)
        {
            for (int iy = 0; iy != ny; ++iy)
            {
                for (int ix = 0; ix != nx; ++ix)
                {
                    auto m = morton(ix, iy, iz, bitsZ);
    
                    std::cout << m << '\n';
                }
            }
        }
    
        return 0;
    }