Search code examples
javamemoryjvmmemory-efficient

Are arrays of 'structs' theoretically possible in Java?


There are cases when one needs a memory efficient to store lots of objects. To do that in Java you are forced to use several primitive arrays (see below why) or a big byte array which produces a bit CPU overhead for converting.

Example: you have a class Point { float x; float y;}. Now you want to store N points in an array which would take at least N * 8 bytes for the floats and N * 4 bytes for the reference on a 32bit JVM. So at least 1/3 is garbage (not counting in the normal object overhead here). But if you would store this in two float arrays all would be fine.

My question: Why does Java not optimize the memory usage for arrays of references? I mean why not directly embed the object in the array like it is done in C++?

E.g. marking the class Point final should be sufficient for the JVM to see the maximum length of the data for the Point class. Or where would this be against the specification? Also this would save a lot of memory when handling large n-dimensional matrices etc

Update:

I would like to know wether the JVM could theoretically optimize it (e.g. behind the scene) and under which conditions - not wether I can force the JVM somehow. I think the second point of the conclusion is the reason it cannot be done easily if at all.

Conclusions what the JVM would need to know:

  1. The class needs to be final to let the JVM guess the length of one array entry
  2. The array needs to be read only. Of course you can change the values like Point p = arr[i]; p.setX(i) but you cannot write to the array via inlineArr[i] = new Point(). Or the JVM would have to introduce copy semantics which would be against the "Java way". See aroth's answer
  3. How to initialize the array (calling default constructor or leaving the members intialized to their default values)

Solution

  • The scenario you describe might save on memory (though in practice I'm not sure it would even do that), but it probably would add a fair bit of computational overhead when actually placing an object into an array. Consider that when you do new Point() the object you create is dynamically allocated on the heap. So if you allocate 100 Point instances by calling new Point() there is no guarantee that their locations will be contiguous in memory (and in fact they will most likely not be allocated to a contiguous block of memory).

    So how would a Point instance actually make it into the "compressed" array? It seems to me that Java would have to explicitly copy every field in Point into the contiguous block of memory that was allocated for the array. That could become costly for object types that have many fields. Not only that, but the original Point instance is still taking up space on the heap, as well as inside of the array. So unless it gets immediately garbage-collected (I suppose any references could be rewritten to point at the copy that was placed in the array, thereby theoretically allowing immediate garbage-collection of the original instance) you're actually using more storage than you would be if you had just stored the reference in the array.

    Moreover, what if you have multiple "compressed" arrays and a mutable object type? Inserting an object into an array necessarily copies that object's fields into the array. So if you do something like:

    Point p = new Point(0, 0);
    Point[] compressedA = {p};  //assuming 'p' is "optimally" stored as {0,0}
    Point[] compressedB = {p};  //assuming 'p' is "optimally" stored as {0,0}
    
    compressedA[0].setX(5)  
    compressedB[0].setX(1)  
    
    System.out.println(p.x);
    System.out.println(compressedA[0].x);
    System.out.println(compressedB[0].x);
    

    ...you would get:

    0
    5
    1
    

    ...even though logically there should only be a single instance of Point. Storing references avoids this kind of problem, and also means that in any case where a nontrivial object is being shared between multiple arrays your total storage usage is probably lower than it would be if each array stored a copy of all of that object's fields.