Search code examples
arraysmultidimensional-arrayjuliajagged-arraysragged

Is Julia's Vector{Vector{T}} stored contiguously in memory?


To motivate my question, consider the case when dealing with jagged arrays (for simplicity's sake) of element type Int in Julia. There are two ways to store them:

  1. As Vector{Vector{Int}}
  2. As Vector{Union{Vector{Int}, Int}} (especially, if one expects to store a sufficiently large number of 1-element vectors)

My question is which one is more efficient / faster / better?

To answer it, among the other things, I need to know how each is stored in memory. Namely:

  1. I presume that variable of a type Vector{Vector{Int}}, would be considered homogeneous type array, and therefore I would expect it to be stored contiguously in memory, and as such to be more cpu-cache-friendly. Am I right? Or contiguity only applies to arrays whose elements' data type is primitive?

  2. Would variable of a type Vector{Union{Vector{Int}, Int}} considered heterogeneous array, and as such stored not contiguously in memory?

  3. How benefit of contiguous representation in memory is compared to the benefit of not having array container for 1-element arrays members, i.e. storing them as primitive data type (Int in this case)? Which one yields more efficiency?


Solution

  • Julia's arrays will only store elements of type T unboxed if isbits(T) is true. That is, the elements must be both immutable and pointer-free. An easy way to see if the elements are being stored immediately is by allocating an uninitialized array. Contiguous arrays of unboxed (immediate) values will have gibberish:

    julia> Array(Int, 3)
    3-element Array{Int64,1}:
     4430901168
     4470602000
     4430901232
    

    whereas arrays of non-isbits types will have #undef pointers:

    julia> Array(Vector{Int}, 3)
    3-element Array{Array{Int64,1},1}:
     #undef
     #undef
     #undef
    

    Imagine what would happen if the latter returned one contiguous chunk of Ints. How would it know how big to make it? Or where one vector stopped and the next began? It would depend upon the sizes of the vectors, which isn't known yet.

    A Vector{Union{Vector{Int}, Int}} will similarly store its elements as pointers; this time it's because Julia doesn't know how to interpret each element inline (should it read the memory like an integer or like an array?). It has the additional disadvantage that Julia no longer knows what type it'll return from indexing. This is a type-instability, and will certainly be much worse for performance than just using one-element vectors.

    It is possible to create your own ragged array type that stores its elements inline, but it's very tricky to make it work with the standard library like a normal array since it breaks a lot of assumptions about how indexing works. You can take a look at my latest attempt: RaggedArrays.jl. You can see how I compare it to previous efforts in Issue#2.