Search code examples
c++algorithmdata-structureslow-levelmemory-layout

Allocating/accessing a 2d array such that 2d sub-blocks are contiguous


I have a 2d array SomeData* data[4096][4096]. Here the data is contiguous along the last coordinate, so that iterating over the y coordinate is faster than iterating over the x coordinate, due to memory locality. However, the access pattern I have is that I look at an entry, and then look at nearby neighbours in both coordinates, i.e. data[x][y] along with data[x-1][y-1], data[x+1][y+1], etc.

If I could allocate this array such that small 2d sub-blocks were contiguous in memory, that would speed things up.

I say allocate, but I suspect the solution here is to allocate a 2d array normally, and then do some trick with the indexing, such that I'm accessing 2d blocks contiguously. In other words, I want some lookup function that translates the coordinates, but I can't immediately see what it should be.

SomeData* data[4096][4096];

SomeData* lookup(size_t x, size_t y) {
    //??? Some logic to translate x,y such that 2d blocks are accessed contiguously.
}

The data array is guaranteed to have both dimensions be a power of two.


Solution

  • Lets say we have a nxm-grid. We want to subdivide the grid into bxb-blocks. It is necessary that n%b=0 and m%b=0.

    Lets call the overall indices I=0,...,n-1 and J=0,...,m-1 and the indices in a block i=0,...,b-1 and j=0,...,b-1.

    I have tried to sketch the layout here.

    Given I, the column index of the block is (I/b) and the in-block-index i=I%b. The row-index of the block is (J/b) and the in-block-index j=J%b.

    Each full block contins b*b elements. Therefore a full row of blocks contains (n/b)bb = n*b elements.

    Putting all together the linear index of element (I,J) is:

    • (I%b) [the column in the block preceding the element]
    • +(J%b) * b [the rows in the block preceding the element]
    • +(I/b) * b*b [the column of blocks preceding the block of the element]
    • +(J/b) * n*b [the row of blocks preceding the block of the element]

    A rough sketch of a runtime-sized blocked-grid-class:

    template <typename T>
    class Block_Grid
    {
    public:
      Block_Grid(std::size_t n, std::size_t m, std::size_t b)
      : _n(n), _m(m), _b(b), _elements(n * m)
      { 
        assert(n % b == 0);
        assert(m % b == 0);
      }
    
      T & operator()(std::size_t i, std::size_t j)
      {
        return _elements[index(i, j)];
      }
    
    protected:
    private:
      std::size_t index(int i, int j) const
      {
        return (i % b) 
               + (j % b) * b
               + (i / b) * b * b
               + (j / b) * n * b;
      }
    
      std::size_t _n;
      std::size_t _m;
      std::size_t _b;
      std::vector<T> _elements;
    };
    

    or a compile-time-sized class

    template <typename T, std::size_t N, std::size_t M, std::size_t B>
    class Block_Grid
    {
      static_assert(N % B == 0);
      static_assert(M % B == 0);
    public:
      Block_Grid() = default;
    
      T & operator()(std::size_t i, std::size_t j)
      {
        return _elements[index(i, j)];
      }
    
    protected:
    private:
      std::size_t index(std::size_t i, std::size_t j) const
      {
        return (i % B) + (j % B) * B + (i / B) * B * B + (j / B) * N * B;
      }
    
      std::array<T, N * M> _elements;
    };