Is it possible to create a contiguous multidimensional array at runtime in D?

I would like to create a rectangular multidimensional array at runtime, such that the entire array is stored in a contiguous block of memory. It is not necessary for the array to be resizable after its initial creation. Effectively, I want to create a static array at runtime, but I would accept an approach that satisfies the stated conditions even if the array is technically of a different type.

More formally, I would like to take two ulongs nr and nc, and create an array arr at runtime such that arr[r][c] is equivalent to *(arr.ptr + r * nc + c), both in terms of what it evaluates to, and the efficiency with which it does so. (*(arr.ptr + c * nr + r) would also be acceptable, although I don't imagine D would use column-major order.)

Is there a way to do this in D?

The closest I've gotten is:

import std.stdio, core.memory;

void main() {
    ulong nr = 3, nc = 4;
    auto p = cast(int*)GC.malloc(nr * nc * int.sizeof);
    auto p1 = cast(int[3][4])p[0..12];
}

But that fails to compile if I change the cast(int[3][4]) to cast(int[nr][nc]).

Comparison of answers

To compare the different answers provided, I ran the following function (with the indexing expression adjusted as needed) over a 10,000,000 x 20 array of doubles.

auto busywork(T)(T p) {
    foreach (r; 0..nr) {
        foreach (c; 0..nc) {
            p[r][c] = r + log(1.0 + c);
        }
    }
    double result = 1.0;
    foreach (t; 0..1) {
        foreach (r; 0..nr) {
            foreach (c; 0..nc) {
                result += log(1.0 + pow(p[r][c], 3));
            }
        }
    }
    return result;
}

baseline is a one-dimensional array manually indexed using p[r * nc + c].
baseline_dynamic is a two-dimensional dynamic array
chunks_array is Bearded Beaver's approach
chunks is Bearded Beaver's approach without the .array
array2D is Jack Applegame's (2D) approach with the bounds checks removed from opIndex.

Runtime

The table reports the median runtime in milliseconds over five runs.

	dmd -O	gdc -O3	gdc -Os	ldc -O3
baseline	13592 +/- 1758	20978 +/- 434	15452 +/- 1178	20778 +/- 1017
baseline_dynamic	13481 +/- 339	20782 +/- 501	13476 +/- 209	20632 +/- 156
chunks	15800 +/- 144	21014 +/- 176	13676 +/- 118	21203 +/- 212
chunksarray	12903 +/- 649	19648 +/- 1080	12895 +/- 519	19770 +/- 816
array2d	*12606 +/- 606*	19844 +/- 1042	12640 +/- 540	19724 +/- 953

Memory Use

chunks and array2d both had zero memory overhead over baseline.
chunksarray had a 9% overhead for all compilers.
baseline_dynamic had 18% overhead under dmd and 41% overhead for gcc and ldc.

Naturally, relative overhead varies based on the array dimensions and type.

Conclusions

The tremendous difference between -O3 and -Os, and the fact that array2d under dmd -O performs better than the baseline in some runs (despite the fact that examination of the generated assembly (of a shorter program) showed that opIndex was not being inlined by dmd) suggests that the number of cache misses is the dominating factor, at least when the array is eating up a decent proportion of the system's RAM. (Setting -boundscheck=off brings the dmd -O version of the baseline in line with the array2d performance, so that's apparently why it array2d did better sometimes.)

chunksarray has the advantages of being shorter code and using the same syntax as dynamic arrays, and might become preferable in cases where nc and/or T.sizeof are large enough that the relative memory overhead becomes less significant, but array2d appears to be the best option in the general case.

Solution

Static and dynamic arrays in D have different layout in memory.

A dynamic array is a reference type, and a static array is a value type.

So, there is no way to cast a block of memory into a multidimensional dynamic array.

Instead you will have to create your own type with an overloaded indexing operator.

Basic example:

import std.stdio : writeln;

void main() {
    auto arr = Array2D!int(3, 4);
    arr[2, 3] = 23;
    // arr[3, 3] = 33; // range violation
    writeln(arr[2, 3]);
    writeln(arr.data[0..12]);
}

struct Array2D(T) {
    import core.memory : GC;
    import std.exception : enforce;
    import core.exception : RangeError;
    
    size_t nr, nc;
    T* data;
    this(size_t nr, size_t nc) {
        this.nr = nr;
        this.nc = nc;
        auto ds = nr * nc;
        data = cast(T*) GC.malloc(T.sizeof * ds);
        data[0..ds] = 0;
    }
    
    ref T opIndex(size_t r, size_t c) {
        enforce!RangeError(r >= 0 && r < nr);
        enforce!RangeError(c >= 0 && c < nc);
        return data[c * nr + r];
    }
}

Basic generic example:

import std.stdio : writeln;

void main() {
    auto arr = makeArrayMD!int(2, 3, 4);
    arr[1, 2, 3] = 123;
    // arr[2, 2, 3] = 223; // range violation
    writeln(arr[1, 2, 3]);
    writeln(arr.data[0..24]);
}

struct ArrayMD(T, size_t dims) {
    import core.memory : GC;
    import std.algorithm : fold;
    import std.exception : enforce;
    import core.exception : RangeError;
    
    size_t[dims] sizes;
    T* data;
    this(Sizes...)(Sizes szs) if(Sizes.length == dims) {
        sizes = [szs];
        auto ds = sizes.fold!"a * b";
        data = cast(T*) GC.malloc(T.sizeof * ds);
        data[0..ds] = 0;
    }
    
    ref T opIndex(Sizes...)(Sizes szs) {
        size_t idx = 0;
        size_t rs = 1;
        static foreach(i, s; szs) {
            enforce!RangeError(s >= 0 && s < sizes[i]);
            static if(i > 0) rs *= sizes[i - 1];
            idx += s * rs;
        }
        return data[idx];
    }
}

auto makeArrayMD(T, DS...)(DS dims) {
    return ArrayMD!(T, DS.length)(dims);
}