Search code examples
javaarraysperformancememory-managementcomputer-science

What are the benefits of flattening the 3-D array to 1-D array?


I have a 3-D array represented in Java as

int[][][] arr = new int[X][Y][z]

If I try to implement the 3-D array using 1-D as

int[] oneDArray = new int[X*Y*Z] 

index of an element arr[i][j][k] in 1-D will be calculated as

indx = X*Y*k + i*Y + j;

Though in the memory location the any N-D array will be stored sequentially.
With the 1-D, if I use three indexes (i,j,k) to access the elements, will this degrade the performance for larger values of X,Y,Z?


Solution

  • a[x][y][z] involves 3 memory accesses, but very little computation.

    a[x * X * Y + y * Y + z] involves one memory access, but more computation.

    Generally, memory access is more expensive than a multiplication, very much so if the memory being accessed is not in the CPU cache.

    For instance, if you repeatedly access random locations in an int[][][], the first dimension will likely be in the CPU cache, but the other two won't be, for a total of 2 loads from main memory by element accessed. In contrast, using an int[] will involve only a single load from main memory, so will be up to twice as fast.

    On the other hand, if you access your int[][][] sequentially, you will have very few cache misses, and an int[] will only be marginally faster.

    All that said, switching to a 1-dimensional array makes the code harder to read and write, and unless the arrays are huge and accessed at great frequency, the performance difference will be too minor to matter.

    To verify the theory, you can use the following benchmark:

    public class Benchmark {
    
        public static void benchmark(String label, Code code) {
            print(25, label);
            
            try {
                for (int iterations = 1; ; iterations *= 2) { // detect reasonable iteration count and warm up the code under test
                    System.gc(); // clean up previous runs, so we don't benchmark their cleanup
                    long previouslyUsedMemory = usedMemory();
                    long start = System.nanoTime();
                    code.execute(iterations);
                    long duration = System.nanoTime() - start;
                    long memoryUsed = usedMemory() - previouslyUsedMemory;
                    
                    if (iterations > 1E8 || duration > 1E9) { 
                        print(25, new BigDecimal(duration * 1000 / iterations).movePointLeft(3) + " ns / iteration");
                        print(30, new BigDecimal(memoryUsed * 1000 / iterations).movePointLeft(3) + " bytes / iteration\n");
                        return;
                    }
                }
            } catch (Throwable e) {
                throw new RuntimeException(e);
            }
        }
        
        private static void print(int desiredLength, String message) {
            System.out.print(" ".repeat(Math.max(1, desiredLength - message.length())) + message);
        }
        
        private static long usedMemory() {
            var r = Runtime.getRuntime();
            return r.totalMemory() - r.freeMemory();
        }
    
        @FunctionalInterface
        interface Code {
            /**
             * Executes the code under test.
             * 
             * @param iterations
             *            number of iterations to perform
             * @return any value that requires the entire code to be executed (to
             *         prevent dead code elimination by the just in time compiler)
             * @throws Throwable
             *             if the test could not complete successfully
             */
            Object execute(int iterations);
        }
    
        public static void main(String[] args) {
            benchmark(" int[][][] traversal, sequential", (iterations) -> {
                int n = (int) Math.pow(iterations, 1.0/3);
                int[][][] a = new int[n][n][n];
                for (int x = 0; x < n; x++) {
                    for (int y = 0; y < n; y++) {
                        for (int z = 0; z < n; z++) {
                            a[x][y][z] = x + y + z;
                        }
                    }
                }
                return a;
            });
            benchmark("flat int[] traversal, sequential", (iterations) -> {
                int n = (int) Math.pow(iterations, 1.0/3);
                int[] a = new int[n * n * n];
                for (int x = 0; x < n; x++) {
                    for (int y = 0; y < n; y++) {
                        for (int z = 0; z < n; z++) {
                            a[(x * n + y) * n + z] = x + y + z;
                        }
                    }
                }
                return a;
            });
            benchmark(" int[][][] traversal, unordered " , (iterations) -> {
                int n = (int) Math.pow(iterations, 1.0/3);
                int[][][] a = new int[n][n][n];
                for (int x = 0; x < n; x++) {
                    for (int y = 0; y < n; y++) {
                        for (int z = 0; z < n; z++) {
                            a[z][y][x] = x + y + z;
                        }
                    }
                }
                return a;
            });
            benchmark("flat int[] traversal, unordered ", (iterations) -> {
                int n = (int) Math.pow(iterations, 1.0/3);
                int[] a = new int[n * n * n];
                for (int x = 0; x < n; x++) {
                    for (int y = 0; y < n; y++) {
                        for (int z = 0; z < n; z++) {
                            a[(z * n + y) * n + x] = x + y + z;
                        }
                    }
                }
                return a;
            });
        }
    }
    

    On my computer, this prints:

      int[][][] traversal, sequential     1.406 ns / iteration      4.204 bytes / iteration
     flat int[] traversal, sequential     0.926 ns / iteration      3.984 bytes / iteration
      int[][][] traversal, unordered     11.716 ns / iteration      4.188 bytes / iteration
     flat int[] traversal, unordered      5.487 ns / iteration      3.984 bytes / iteration
    

    As you can see, the performance differences only matter if you do millions of array accesses per second (or dozens of millions, if the access is sequential). The situations where one should replace an int[][][] with an int[] for performance reasons are therefore rare indeed.