Search code examples
lualua-table

Why do Lua tables avoid gaps when initializing the table differently?


Why do Lua tables (rehashes?) avoid gaps when I use a different syntax?

inspect function

d = require "core/modules/inspect"

Case 1: Standard syntax - 1st element is a gap

t = {1,2,3}
t[1] = nil
d(t)
__________
{ nil, 2, 3 }

Case 2: Brackets syntax is - no gaps

 t = {
    [1] = 1,
    [2] = 2,
    [3] = 3,
}
t[2] = nil
d(t)
__________
{ 1,
  [3] = 3
}

Case 3: Dynamic array - no gaps

t = {}
t[1] = 1
t[2] = 2
t[3] = 3
t[2] = nil
d(t)
__________
{ 1,
  [3] = 3
}

Case 4: Dynamic array when set nil in 1st element - is a gap

t = {}
t[1] = 1
t[2] = 2
t[3] = 3
t[1] = nil
d(t)
__________
{ nil, 2, 3 }

Case 5: Brackets syntax set nil in 1st element is still no gaps

t = {
    [1] = 1,
    [2] = 2,
    [3] = 3,
}
t[1] = nil
d(t)
__________
{
  [2] = 2,
  [3] = 3
}

Solution

  • Lua does not define behaviour for the length operator for non-sequential indexes, but the accepted answer to Why does Lua's length (#) operator return unexpected values? goes into what practically happens in the standard implementation of Lua 5.2. That said, that answer on its own doesn't completely explain the behaviour, so here's a case-by-case explanation, using Lua 5.3.5's standard implementation.

    Note: I use the length operator to clearly show cases where holes are occuring. Regarding how this applies to the inspection function you've used, from what I can see of its behaviour it simply explicitly states any indexes outside of the range 1 <= index <= #table.

    Case 1:

    > t = {1,2,3}
    > t[1] = nil
    > print(#t)
    3
    

    The size of a Lua table's array part is based on a power of two - except when a table is constructed with pre-set values. The line t = {1,2,3} creates a table with an array part of size 3. When you perform t[1] = nil, the array part has no reason to re-size, so its size remains at 3. As the last item in the array part is non-nil, and the hash part is empty, the array size - 3 - is returned.

    Case 2:

    > t = {[1] = 1, [2] = 2, [3] = 3}
    > t[2] = nil
    > print(#t)
    1
    

    If you're defining table values within the table constructor, Lua will associate any key within square brackets with the hash part of the table. So, the array part of the table is actually empty. The implementation performs a binary search on the hash part of the array, and gets 1.

    Case 3:

    > t = {}
    > t[1] = 1
    > t[2] = 2
    > t[3] = 3
    > t[2] = nil
    > print(#t)
    1
    

    When square brackets are used outside of the constructor, Lua will check numerical indexes to see if they should go into the array part or the hash part. In this case, each of the indexes will go into the array part, but because these values are defined outside of the constructor, the array will be sized on assignment based on powers of two. So, after the initial three assignments the array part will have a size of 4. The nil assignment doesn't trigger a re-size, so it stays at 4. Thus, when the length operator is applied, it sees that the final value in the array's space is nil, and so it binary searches for the index before the first nil value, which is 1.

    Case 4:

    > t = {}
    > t[1] = 1
    > t[2] = 2
    > t[3] = 3
    > t[1] = nil
    > print(#t)
    3
    

    This case is exactly the same case Case 3, but the binary search sees that t[2] is non-nil and so doesn't consider the values at any index before 2. This results in the search continuing on the indexes after 2, which concludes that the length is 3.

    Case 5:

    > t = {[1] = 1, [2] = 2, [3] = 3}
    > t[1] = nil
    > print(#t)
    0
    

    This is like Case 2 in the sense that the table items are members of the hash part of the table. In this case, however, the search process results in 0. This is because before attempting the binary search, the function determines what range of integer indexes it should search over. The presence of a nil value at the first index means that the loop which determines the range doesn't run. For more information on how that search process works, you can see the function here. When I say the loop doesn't run, I am specifically referring to the while condition, !ttisnil(luaH_getint(t, j)).

    Bonus case:

    > t = {}
    > t[1] = 1
    > t[4] = 4
    > print(#t)
    1
    > g = {}
    > g[1] = 1
    > g[3] = 3
    > g[4] = 4
    > print(#g)
    4
    

    In Case 3 I mentioned the fact that Lua decides whether to put a numerical index in the array part or the hash part. In the composition of the table t above, index 1 is in the array part and 4 in the hash part. As such, the length returned is 1. In the case of g, all values are placed into the array part. This is because when assigning values to indexes, Lua tries to re-size the array part in an optimal way. For a greater understanding of this re-sizing process, you can view the source here.