Search code examples
stringluagarbage-collection

Absolute achievable minimum of GC with string concat in LUA?


Runtime: lua 5.1.x compiled under ARM64, no c-modules allowed

Example code, ready to run: https://paste.gg/p/anonymous/08f364480a5f470e9da610ab565e11c0

I need to concat bunch of string per X ms in a loop. From my understanding, LUA supports string interning, which means that string literals are "cached" and not allocated each time. Therefore, only direct calls tostring() (or .. sugar) will allocate. The rest of existing string-values will be passed by reference.

What I've done so far:

  • eliminated all integer->string allocations (via LUT)
  • although tostring(bool) does return interned string from cache, I eliminated that too
  • created pseudo-stringbuilder via table that works via indicies (~16B each)
  • "pre-resized" said table to avoid cost of associative addition and made it a global one so it is not collected and re-created each time
  • used table.concat() for final big string concatenation

The final results still make me sad:

Allocated pre-concat: 2.486328125 KB
Allocated post-concat: 39.7451171875 KB
Total table meta bytes: 1544 B
Total tostring meta bytes: 273 B

Is there something I'm missing or am I at the limit of LUA here?


Solution

  • You want to minimize the number of intermediate allocations of strings object in order to reduce the GC pressure and slow down GC hits. In this case, I suggest you to limit yourself to 1 call to string.format with the string your want to format:

    • The string format can be declared globally so that it is interned once.
    • The string.format code can be read here. What we can see from this code is that the intermediate string transformations are done on the C stack with a buffer of LUAL_BUFFERSIZE bytes. This size is declared in luaconf.h and can be customized according to your needs. This approach should be the most efficient for your use-case as you just drop all the intermediate steps (table insertions, table.concat, etc).
    local MY_STRING_FORMAT = [[My Very Big String
    param-string-1 %d
    param-string-2 %x
    param-string-3 %f
    param-string-4 %d
    param-string-5 %d
    ]]
    
    string.format(MY_STRING_FORMAT,
                  Param1,
                  Param2,
                  Param3,
                  Param4,
                  Param5,
                  etc...)