algorithm sparse-matrix adjacency-matrix

How map 2D matrix entries to 1D array with a "pointer array" indirection?

Note: I've edited this question with the helpful feedback of Keith Randall and Chiyou.

I have an idea to use 1D array to cache most used 2D NxM matrix entries in a particular processing context. The question is that does there exist already a body of work from where to gain insight or is there just people in the know. I will outline the leg-work first and ask the question later too. As an additional note, Chiyou already proposed space filling curves, which look conceptually the thing I'm after, but won't readily answer to my specific question (i.e. I can't find a curve that would do what I'm after).

The leg-work: Currently the most used entries are defined as a combination of loops and a special pointer_array (see code below), which in combination produce the indices to the most used matrix elements. The pointer_array contains numbers in range [0, max(N, M)] (see above matrix dimensions defined as NxM).

The purpose of the code is to produce a suitable (in context of the program) combination of indices to process some of the matrix entries. It's possible N = M, i.e. a square matrix. The code (C++) traversing the matrix would look like the following

for(int i = 0; i < N; i++)
{        
    for(int j = i + 1; j < M; j++)
    {
        auto pointer1 = pointer_array[i];
        auto pointer2 = pointer_array[i + 1];
        auto pointer3 = pointer_array[j];
        auto pointer4 = pointer_array[j + 1];

        auto entry1 = some_matrix[pointer1][pointer3];
        auto entry2 = some_matrix[pointer2][pointer4];
        auto entry3 = some_matrix[pointer3][pointer4];
        auto entry4 = some_matrix[pointer1][pointer2];

        //Computing with the entries...
    }
}

Some things worth of separate note:

To make this caching worthwhile, I think it should be dense. That is, something that can be accessed in random and is laid out contiguosly in memory. I.e. no space "wasted" (except maybe if there are many separate chunks) and complexity is in O(1). It would mean the cache can't be the same 2D matrix ordered in its entirety as a 1D row/column-major ordered array. I feel it should fit into an array having length max(N, M) (see above matrix dimension definition).
Many of the entries in some_matrix will be ignored during the loops.
The pointer_array ordering records a route through the matrix, so it's used elsewhere too.
I've come across this need in various occasions, but this time I've written a 2-opt algorithm, memory access pattern of which I'm seeking to improve. I'm aware there are other ways to write the algorithm (but see the point there are other settings I've come across this and am now wondering if there were a general solution).
Outside of the box thinking would be to produce a similar kind of combination of indices to something that is conceptually like accessing a 2D matrix, but only smarter. One option would be to cache matrix rows/columns during looping.
As the entries will be used a lot, so it would be more ideal to replace the pointer_array indirection by caching the some_matrix calls so that acquiring the entry* values in the loops would be faster, i.e. from a pre-loaded cache instead of possibly rather big matrix. Other point here is that it would save in storage, which in turn means the often used values could very well fit a faster cache.

Question: Is it possible to device an indexing scheme so that the matrix indexing would essentially become

//Load the 2D matrix entries to some_1d_cache using the pointer_array
//so that the loop indices can be used directly...

for (int i = 0; i < N; i++)
{        
    for (int j = i + 1; j < M; j++)
    {           
        //The some_1d_cache((x, y) call will calculate an index to the 1D cache array...
        auto entry1 = some_1d_cache(i, j);
        auto entry2 = some_1d_cache(i + 1, j + 1);
        auto entry3 = some_1d_cache(j, j + 1);
        auto entry4 = some_1d_cache(i, i + 1);

        //Computing with the entries...
    }
}

Or maybe something like following would do too

for (int i = 0; i < N; i++)
{        
    for (int j = i + 1; j < M; j++)
    {            
        auto pointer1 = pointer_array[i];
        auto pointer2 = pointer_array[i + 1];
        auto pointer3 = pointer_array[j];
        auto pointer4 = pointer_array[j + 1];

        //The some_1d_cache((x, y) call will calculate an index to the 1D cache array...
        auto entry1 = some_1d_cache(pointer1, pointer3);
        auto entry2 = some_1d_cache(pointer2, pointer4);
        auto entry3 = some_1d_cache(pointer3, pointer4);
        auto entry4 = some_1d_cache(pointer1, pointer2);

        //Computing with the entries...
    }
}

Solution

You can index a 2d matrix with a 1d space filling curve. Mathematically it's H(x,y) = (h(x) * h(y)). They are also useful because they basically subdivide, store some locality information and reorder the plane.