How do I create a numba List and populate it in parallel

import numba
from numba import prange
from numba.typed import Dict, List

@numba.njit(parallel=True)
def create_list_of_dicts(input: List[int]):
    output = List()
    for i in prange(len(input)):
        output.append((Dict([(1, 2)]), input[i]))
    return output
len(create_list_of_dicts([i for i in range(1000)]))

crashes my Python interpreter with

double free or corruption (!prev)
Aborted (core dumped)

I assume that's because List.append is not thread-safe. Is there a workaround?

Prepopulating the list with

output = [None] * len(input)

and then assigning to entries via

output[i] = ...

gives

No implementation of function Function(<built-in function setitem>) found for signature:
 
 >>> setitem(list(none)<iv=None>, int64, Tuple(DictType[int64,int64]<iv=None>, int64))

Solution

Numba does not check if a code can be executed safely in parallel (this is actually very hard to do and in some cases simply impossible without additional restriction). Thus, this is your responsibility to do that. Here, there is a race condition due to the shared access on the list. This results in an undefined behaviour and in practice a crash.

There is no efficient way to directly append items to a list in parallel. The usual solution include using critical sections (which is very inefficient and prevent any parallelism so it is useless here), or build local lists so to then merge the result (which would be clearly sub-optimal here. The best solution is indeed to preallocate the list when you know its final size (which is possible here).

Numba complains because you fill the list with the None items of none. Since Numba lists are always typed and natives type do not include this None type, it results in a typing error. Theoretically, you can can specify to Numba that the type can be either none or another specific given type (with the optional type), but this often makes access more complex, possibly less efficient and this is not needed here. Lets first try to create the list with default values.

Building a list WITHOUT optional types

We need to build the list directly with the correct type. In fact, output = List() is not valid since the target list does not have a type matching with the one you want. That being said, the target type is not so simple and there is a catch: with the right type, creating a list with a predefined size force you to create/initialize all the item and so the dictionary objects though they can all reference the same object for sake of performance of the initialization. Here is the resulting code:

import numba as nb

# Type definition
int64 = nb.types.int64
dictType = nb.types.DictType(int64,int64)
itemType = nb.types.Tuple([dictType, int64])

# Creates an empty list with the right type
# This code can be put in the target function.
# lst = nb.typed.typedlist.List.empty_list(item_type=itemType)

@nb.njit(parallel=True)
def create_list_of_dicts(input):
    defaultDict = nb.typed.typeddict.Dict.empty(key_type=int64, value_type=int64)
    output = [(defaultDict,int64(0))] * len(input)
    for i in nb.prange(len(input)):
        output[i] = (({int64(1): int64(2)}, input[i]))
    return output

inputList = nb.typed.List[int64]([i for i in range(1000)])
len(create_list_of_dicts(inputList))

The bad news is that the code is not faster. At first glance, it seems most of the time is still spent in the creation of the list. However, the profiling informations tend to indicate that the problem comes from the conversion of the output from a typed list to a reflected list. This issue is similar to the previous post. Thus, this solution is only useful if you do some expensive computations in the loop.

This problem can be solved by explicitly converting output to a List. So you just need to replace return output with return nb.typed.List(output). The type of the list is automatically inferred in this case. Once fixed, this implementation is significantly faster than the initial one using append since append slow. Using a parallel loop does not significantly improve the execution time. It is because allocations do not scale and also because the parallel loop is too fast compared to the time required to creates threads, distribute the work, etc. If the input range is 10_000, then I can see a benefit of using multiple threads (50-60% speed up) though it is small due to the allocation issue which AFAIK cannot be easily fixed here.

Building a list WITH optional types

I did not succeed so far to build a list of item with an optional type. It is apparently not supported yet for lists. Indeed, the following code show that:

# Works well
lst = nb.typed.typedlist.List.empty_list(itemType)

# Fail
optItemType = nb.types.optional(itemType)
lst = nb.typed.typedlist.List.empty_list(optItemType)

Here is the reported error for the list with optional types:

TypingError: List.item_type cannot be of type OptionalType(Tuple(DictType[int64,int64]<iv=None>, int64))

Note that this clear error is reported only if the line is not in a function. Otherwise, Numba throws a significantly more complex error but the idea is the same : some functions to be implemented are currently missing.

Theoretically, it should also be possible to implement optional types with algebraic types but it looks likes Numba does not support it. At least, I did not find any reference of that in the documentation and I did not succeed to make it works with types like Any so far.

Discussion

For advanced usage like this, I think you reach the limit of Numba and it is not the right tool. Indeed, there is not just one issue but several ones:

List conversions are clearly inefficient so far though you can avoid them;
Allocations are slow in parallel loops so codes tend not to scale well;
There is no way to control allocations so to reduce its overhead;
Optional and algebraic types do not seems to be supported yet though you may not need them eventually;
Manual typing and manual list creation is pretty cumbersome with Numba (verbose and barely documented IMHO).

Maybe a module like Awkward (supporting Numba) can help to fix some issues and it might be enough for you, but I recommand you to use a lower-level language for that (like C++ or Rust).

If you really need to return the output type as a list of tuple containing dictionary, then be aware that creating this data structure will be rather expensive (whatever the method used to do that). You can wrap this in an opaque type so to avoid conversions but then reading/writing items in the data structure will be more cumbersome and less efficient.

Notes

Note that I used int64 but using smaller types might be better in your case (smaller types takes less memory so they can be good for performance too).

Also please note that if Numba was using reflected lists, then the loops it would not be possible to parallelize the operation using Numba threads since all CPython objects needs to be protected by the GIL and the GIL is disabled in Numba parallel loops anyway so no Python objects must be accessed in a Numba parallel loop. Even if it would be possible, then the GIL would prevent any actual parallelism. Not to mention multiprocessing would not be efficient here due to inter-process communication and pickling.

Note that AFAIK Windows and Linux use different default types for integers. Windows tends to use 32-bit integer by default while it tends to be 64-bit on Linux. This is why I specified the types in the code: to avoid warning and type conversion errors.