I would like to create a rectangular multidimensional array at runtime, such that the entire array is stored in a contiguous block of memory. It is not necessary for the array to be resizable after its initial creation. Effectively, I want to create a static array at runtime, but I would accept an approach that satisfies the stated conditions even if the array is technically of a different type.
More formally, I would like to take two ulong
s nr
and nc
, and create an array arr
at runtime such that arr[r][c]
is equivalent to *(arr.ptr + r * nc + c)
, both in terms of what it evaluates to, and the efficiency with which it does so. (*(arr.ptr + c * nr + r)
would also be acceptable, although I don't imagine D would use column-major order.)
Is there a way to do this in D?
The closest I've gotten is:
import std.stdio, core.memory;
void main() {
ulong nr = 3, nc = 4;
auto p = cast(int*)GC.malloc(nr * nc * int.sizeof);
auto p1 = cast(int[3][4])p[0..12];
}
But that fails to compile if I change the cast(int[3][4])
to cast(int[nr][nc])
.
To compare the different answers provided, I ran the following function (with the indexing expression adjusted as needed) over a 10,000,000 x 20 array of doubles.
auto busywork(T)(T p) {
foreach (r; 0..nr) {
foreach (c; 0..nc) {
p[r][c] = r + log(1.0 + c);
}
}
double result = 1.0;
foreach (t; 0..1) {
foreach (r; 0..nr) {
foreach (c; 0..nc) {
result += log(1.0 + pow(p[r][c], 3));
}
}
}
return result;
}
p[r * nc + c]
.The table reports the median runtime in milliseconds over five runs.
dmd -O | gdc -O3 | gdc -Os | ldc -O3 | |
---|---|---|---|---|
baseline | 13592 +/- 1758 | 20978 +/- 434 | 15452 +/- 1178 | 20778 +/- 1017 |
baseline_dynamic | 13481 +/- 339 | 20782 +/- 501 | 13476 +/- 209 | 20632 +/- 156 |
chunks | 15800 +/- 144 | 21014 +/- 176 | 13676 +/- 118 | 21203 +/- 212 |
chunksarray | 12903 +/- 649 | 19648 +/- 1080 | 12895 +/- 519 | 19770 +/- 816 |
array2d | 12606 +/- 606 | 19844 +/- 1042 | 12640 +/- 540 | 19724 +/- 953 |
Naturally, relative overhead varies based on the array dimensions and type.
The tremendous difference between -O3
and -Os
, and the fact that array2d
under dmd -O
performs better than the baseline in some runs (despite the fact that examination of the generated assembly (of a shorter program) showed that opIndex
was not being inlined by dmd) suggests that the number of cache misses is the dominating factor, at least when the array is eating up a decent proportion of the system's RAM. (Setting -boundscheck=off brings the dmd -O
version of the baseline in line with the array2d performance, so that's apparently why it array2d did better sometimes.)
chunksarray has the advantages of being shorter code and using the same syntax as dynamic arrays, and might become preferable in cases where nc
and/or T.sizeof
are large enough that the relative memory overhead becomes less significant, but array2d appears to be the best option in the general case.
Static and dynamic arrays in D have different layout in memory.
A dynamic array is a reference type, and a static array is a value type.
So, there is no way to cast a block of memory into a multidimensional dynamic array.
Instead you will have to create your own type with an overloaded indexing operator.
Basic example:
import std.stdio : writeln;
void main() {
auto arr = Array2D!int(3, 4);
arr[2, 3] = 23;
// arr[3, 3] = 33; // range violation
writeln(arr[2, 3]);
writeln(arr.data[0..12]);
}
struct Array2D(T) {
import core.memory : GC;
import std.exception : enforce;
import core.exception : RangeError;
size_t nr, nc;
T* data;
this(size_t nr, size_t nc) {
this.nr = nr;
this.nc = nc;
auto ds = nr * nc;
data = cast(T*) GC.malloc(T.sizeof * ds);
data[0..ds] = 0;
}
ref T opIndex(size_t r, size_t c) {
enforce!RangeError(r >= 0 && r < nr);
enforce!RangeError(c >= 0 && c < nc);
return data[c * nr + r];
}
}
Basic generic example:
import std.stdio : writeln;
void main() {
auto arr = makeArrayMD!int(2, 3, 4);
arr[1, 2, 3] = 123;
// arr[2, 2, 3] = 223; // range violation
writeln(arr[1, 2, 3]);
writeln(arr.data[0..24]);
}
struct ArrayMD(T, size_t dims) {
import core.memory : GC;
import std.algorithm : fold;
import std.exception : enforce;
import core.exception : RangeError;
size_t[dims] sizes;
T* data;
this(Sizes...)(Sizes szs) if(Sizes.length == dims) {
sizes = [szs];
auto ds = sizes.fold!"a * b";
data = cast(T*) GC.malloc(T.sizeof * ds);
data[0..ds] = 0;
}
ref T opIndex(Sizes...)(Sizes szs) {
size_t idx = 0;
size_t rs = 1;
static foreach(i, s; szs) {
enforce!RangeError(s >= 0 && s < sizes[i]);
static if(i > 0) rs *= sizes[i - 1];
idx += s * rs;
}
return data[idx];
}
}
auto makeArrayMD(T, DS...)(DS dims) {
return ArrayMD!(T, DS.length)(dims);
}