I've read you can start an "array" at any index in Lua, but "ipairs" seems to only work if it starts at 1 [EDIT: was 0].
> function printIpairs(a)
>> for i, v in ipairs(a) do
>> print(i .. '=' .. v)
>> end
>> end
> printIpairs({4,5,6})
1=4
2=5
3=6
> s = {}
> s[0] = 4
> s[1] = 5
> s[2] = 6
> printIpairs(s)
1=5
2=6
> t = {}
> t[4] = 4
> t[5] = 5
> t[6] = 6
> printIpairs(t)
>
So my question is, do s and t use "3 words" to store the 3 values internally as a sequence, or more(4/6), bc part/whole is stored as a "mapping" instead? And if they do use only 3 words, how do you find the "starting index" of the "sequence part", if ipairs won't give it to me?
I've read you can start an "array" at any index in Lua
Yes, Lua tables are associative arrays. Almost any key is fine; only not-a-number and nil
are not allowed as keys. So {[-1] = 1, [-2] = 2, [-3] = 3}
is as valid a Lua table as {[1.5] = 1, [2.5] = 2, [3.5] = 3}
. How you "number" your keys is, semantically speaking, just a convention. The Lua convention, which standard library functions work under, is that arrays to be used at lists start with the key 1 and increment in steps of 1.
but "ipairs" seems to only work if it starts at 0.
At 1. But it's easy to write a version of ipairs
which starts at 42 (or 0):
local function inext42(t, i)
i = i + 1
local v = t[i]
if v == nil then return end
return i, v
end
function ipairs42(t)
return inext42, t, 41
end
for i, v in ipairs42({[42] = 1, [43] = 2, [44] = 3}) do
print(i, v)
end
(you could even override the built-in ipairs
, but that's almost always a bad idea, since it will break third-party code)
Similarly, you could implement variants for other standard library functions adhering to your indexing conventions, often in terms of the original functions.
So my question is, do s and t use "3 words" to store the 3 values internally as a sequence, or more(4/6), bc part/whole is stored as a "mapping" instead?
This question is, strictly speaking, not a "Lua" question: The Lua programming language is a mere specification, a reference manual. There is a reference implementation, but there is a variety of implementations and hence probably a variety of memory usage patterns, though I would expect some things to be similar. I'll roughly outline how PUC Lua and LuaJIT would use memory for tables, since these are the most popular implementations.
Yes, PUC Lua, and LuaJIT, separately store a "list" ("sequence") part and a "hash" ("mapping") part. This is necessary to make lists fast and memory-efficient.
(Beware: The "list" part does not need to store a sequence. It may have holes, for example; only at some point, when it gets too sparse and you add a new entry, will PUC Lua move (part of) it to the hash part when "rehashing". Similarly, part of a sequence may live in the hash part, until it gets "dense" enough for PUC Lua to move it to the "list" part on the next "rehashing".)
A table will use many more machine "words" than just 3. Pointers to the list and sequence parts need to be kept. There's a metatable field. The list and sequence part both have capacities that need to be remembered, etc etc; see the sources for details. All in all, an empty table comes to around at least 56 bytes on my machine.
Now, when you fill the array or hash part, allocations need to happen for these. There will perhaps be some allocator overhead.
Allocations for the list part happen in powers of two. This is necessary to make appending items fast (amortized constant time). If tables only used minimal size, you might need to do a linear time reallocation every time you add an element.
This means, for a 3-item table, Lua will use 4 slots. In PUC Lua, each slot will be a type - value pair. This means two "machine words", so 16 bytes on a 64-bit machine. On LuaJIT, values are stored more compactly in just 8 bytes using a technique called "NaN boxing".
Now, if you use [0] = v
in your table, that entry lives in the "hash part" in PUC Lua 5.1. This means a hash part has to be allocated to begin with, and at least 2 slots (4 words) will be needed to store the key-value part alone. All other keys however can live in the array part. In your specific example, the array part can have size exactly two. This means 2 slots (4 words) are saved there. So the two solutions might be comparable in terms of memory usage, though hash part overhead will probably put the former at a disadvantage. Measure using collectgarbage
to see for yourself (you need to be careful to either turn off garbage collection, creating no other garbage, or to make sure the tables you are creating for benchmarking purposes are not garbage).
Now, LuaJIT is special here in that it "wastes" a slot for the 0 key, to make using 0 as performant as not using it, and presumably to not have to subtract 1 when indexing. So on LuaJIT, there is only a minuscule difference between starting at 0 and starting at 1; in fact starting at 0 technically leverages one more slot that would otherwise go unused (but unless your table size is a power of two, this is irrelevant).
Using keys 4-6 means the entire table will land in the hash part on both PUC and JIT. This means at least 4 additional slots for the keys (the hash part will also use a power of two size). This will use the most memory on both PUC and JIT.
And if they do use only 3 words, how do you find the "starting index" of the "sequence part", if ipairs won't give it to me?
Well, they don't, so the only way to find this index is by iterating over the entire table to find it (or by knowing it by convention, or storing it in another field, or caching it, etc etc).
collectgarbage("count")
and potentially collectgarbage("stop")
. Consider moving data structures to a language like C, or using LuaJIT FFI if necessary.