Search code examples
arraysdmemory-layout

Is it possible to create a contiguous multidimensional array at runtime in D?


I would like to create a rectangular multidimensional array at runtime, such that the entire array is stored in a contiguous block of memory. It is not necessary for the array to be resizable after its initial creation. Effectively, I want to create a static array at runtime, but I would accept an approach that satisfies the stated conditions even if the array is technically of a different type.

More formally, I would like to take two ulongs nr and nc, and create an array arr at runtime such that arr[r][c] is equivalent to *(arr.ptr + r * nc + c), both in terms of what it evaluates to, and the efficiency with which it does so. (*(arr.ptr + c * nr + r) would also be acceptable, although I don't imagine D would use column-major order.)

Is there a way to do this in D?

The closest I've gotten is:

import std.stdio, core.memory;

void main() {
    ulong nr = 3, nc = 4;
    auto p = cast(int*)GC.malloc(nr * nc * int.sizeof);
    auto p1 = cast(int[3][4])p[0..12];
}

But that fails to compile if I change the cast(int[3][4]) to cast(int[nr][nc]).


Comparison of answers

To compare the different answers provided, I ran the following function (with the indexing expression adjusted as needed) over a 10,000,000 x 20 array of doubles.

auto busywork(T)(T p) {
    foreach (r; 0..nr) {
        foreach (c; 0..nc) {
            p[r][c] = r + log(1.0 + c);
        }
    }
    double result = 1.0;
    foreach (t; 0..1) {
        foreach (r; 0..nr) {
            foreach (c; 0..nc) {
                result += log(1.0 + pow(p[r][c], 3));
            }
        }
    }
    return result;
}
  • baseline is a one-dimensional array manually indexed using p[r * nc + c].
  • baseline_dynamic is a two-dimensional dynamic array
  • chunks_array is Bearded Beaver's approach
  • chunks is Bearded Beaver's approach without the .array
  • array2D is Jack Applegame's (2D) approach with the bounds checks removed from opIndex.

Runtime

The table reports the median runtime in milliseconds over five runs.

dmd -O gdc -O3 gdc -Os ldc -O3
baseline 13592 +/- 1758 20978 +/- 434 15452 +/- 1178 20778 +/- 1017
baseline_dynamic 13481 +/- 339 20782 +/- 501 13476 +/- 209 20632 +/- 156
chunks 15800 +/- 144 21014 +/- 176 13676 +/- 118 21203 +/- 212
chunksarray 12903 +/- 649 19648 +/- 1080 12895 +/- 519 19770 +/- 816
array2d 12606 +/- 606 19844 +/- 1042 12640 +/- 540 19724 +/- 953

Memory Use

  • chunks and array2d both had zero memory overhead over baseline.
  • chunksarray had a 9% overhead for all compilers.
  • baseline_dynamic had 18% overhead under dmd and 41% overhead for gcc and ldc.

Naturally, relative overhead varies based on the array dimensions and type.

Conclusions

The tremendous difference between -O3 and -Os, and the fact that array2d under dmd -O performs better than the baseline in some runs (despite the fact that examination of the generated assembly (of a shorter program) showed that opIndex was not being inlined by dmd) suggests that the number of cache misses is the dominating factor, at least when the array is eating up a decent proportion of the system's RAM. (Setting -boundscheck=off brings the dmd -O version of the baseline in line with the array2d performance, so that's apparently why it array2d did better sometimes.)

chunksarray has the advantages of being shorter code and using the same syntax as dynamic arrays, and might become preferable in cases where nc and/or T.sizeof are large enough that the relative memory overhead becomes less significant, but array2d appears to be the best option in the general case.


Solution

  • Static and dynamic arrays in D have different layout in memory.

    A dynamic array is a reference type, and a static array is a value type.

    So, there is no way to cast a block of memory into a multidimensional dynamic array.

    Instead you will have to create your own type with an overloaded indexing operator.

    Basic example:

    import std.stdio : writeln;
    
    void main() {
        auto arr = Array2D!int(3, 4);
        arr[2, 3] = 23;
        // arr[3, 3] = 33; // range violation
        writeln(arr[2, 3]);
        writeln(arr.data[0..12]);
    }
    
    struct Array2D(T) {
        import core.memory : GC;
        import std.exception : enforce;
        import core.exception : RangeError;
        
        size_t nr, nc;
        T* data;
        this(size_t nr, size_t nc) {
            this.nr = nr;
            this.nc = nc;
            auto ds = nr * nc;
            data = cast(T*) GC.malloc(T.sizeof * ds);
            data[0..ds] = 0;
        }
        
        ref T opIndex(size_t r, size_t c) {
            enforce!RangeError(r >= 0 && r < nr);
            enforce!RangeError(c >= 0 && c < nc);
            return data[c * nr + r];
        }
    }
    

    Basic generic example:

    import std.stdio : writeln;
    
    void main() {
        auto arr = makeArrayMD!int(2, 3, 4);
        arr[1, 2, 3] = 123;
        // arr[2, 2, 3] = 223; // range violation
        writeln(arr[1, 2, 3]);
        writeln(arr.data[0..24]);
    }
    
    struct ArrayMD(T, size_t dims) {
        import core.memory : GC;
        import std.algorithm : fold;
        import std.exception : enforce;
        import core.exception : RangeError;
        
        size_t[dims] sizes;
        T* data;
        this(Sizes...)(Sizes szs) if(Sizes.length == dims) {
            sizes = [szs];
            auto ds = sizes.fold!"a * b";
            data = cast(T*) GC.malloc(T.sizeof * ds);
            data[0..ds] = 0;
        }
        
        ref T opIndex(Sizes...)(Sizes szs) {
            size_t idx = 0;
            size_t rs = 1;
            static foreach(i, s; szs) {
                enforce!RangeError(s >= 0 && s < sizes[i]);
                static if(i > 0) rs *= sizes[i - 1];
                idx += s * rs;
            }
            return data[idx];
        }
    }
    
    auto makeArrayMD(T, DS...)(DS dims) {
        return ArrayMD!(T, DS.length)(dims);
    }