Optimizing for-enumerate loop with numba - optimized version is slower

I am trying to optimize this function with numba:

@jit(nopython=False, forceobj=True)
def _packing_loop(data: List[str], indices: np.ndarray, strings: List[str], offsets: List[int], str_map: Dict[str, int], acc_len):
    for i, s in enumerate(data):
        # handle null strings.
        if not s:
            indices[i] = -1
            continue

        index = str_map.get(s)
        if index is None:
            # increment the length
            acc_len += len(s)

            # store the string and index
            index = len(strings)
            strings.append(s)
            # strings += s,
            str_map[s] = index

            # write the offset
            offsets.append(acc_len)
            # offsets += acc_len,

        indices[i] = index

The issue is that the optimized code is ~1.5 times slower (even if I pre-run the function once before benchmarking).

What is the possible reason? I would be also grateful for any suggestions on how to actually optimize this function.

P.S. I am not limited to numba, other approaches are also possible.

Solution

forceobj generate an inefficient code. To quote the documentation:

If true, forceobj forces the function to be compiled in object mode. Since object mode is slower than nopython mode, this is mostly useful for testing purposes.

You should not use it in production. Besides, Numba do not supports well reflected lists. You should use typed lists for sake of performance. In fact, typing is what makes Numba fast. Without types, Numba cannot generate a compiled code and a slow interpreted version is executed. Note that strings are barely supported and clearly not efficient yet. AFAIK, Numba does not benefit from mypy type annocation like List[str]. I advise you to use Cython in this case. That being said, the speed up will be certainly small since you are dealing mostly with slow CPython dynamic objects.