I'm working on a voxel raytracer in Python, just ported from Tkinter to Pygame for window management and pixel drawing. I use a thread pool to do the raytracing for each pixel, in my original code the trace
function does various calculations before returning the color as a hex string: The main loop runs periodically on the main thread (eg: 30 times a second for 30 FPS) and calls the pool with a range to request new traces and update all pixel colors, each index is translated to a 2D position to know which location each color refers to. I left out functions unrelated to my question in this simplified example, like how I'm converting index i
to two x, y
integer positions in a custom vector class, same as the hex to r, g, b
converter... and yes I have a way to break out of the while True
loop when quitting, the representation below runs just as intended.
import multiprocessing as mp
import pygame as pg
def trace(i):
# Rays are calculated here, for simplicity of example return a fixed color
return "ff7f00"
pg.init()
screen = pg.display.set_mode((64, 16))
clock = pg.time.Clock()
pool = mp.Pool()
while True:
# Raytrace each pixel and draw the new color, 64 * 16 = 1024
result = pool.map(trace, range(0, 1024)):
for i, c in enumerate(result):
pos = vec2_from_index(i)
col = rgb_from_hex(c)
screen.set_at((pos.x, pos.y), (col.r, col.g, col.b))
clock.tick(30)
But there's a problem: Performance is very slow on the main thread which acts as a bottleneck, the tracing threads don't even get to run at their full potential because of this. At higher resolutions there are a lot more pixels, eg: 240 x 120 = 28800 entries in the result
array; Merely fetching it without doing anything to the result saddles the main thread, enumerating the result to apply the colors makes it even worse. I'm hoping to remove this burden by changing the pixel being traced directly on the thread calculating it, instead of the helper thread merely returning the 6-character hex string and the main thread having to process it. The ideal code would then look something like this instead:
import multiprocessing as mp
import pygame as pg
pg.init()
screen = pg.display.set_mode((64, 16))
clock = pg.time.Clock()
pool = mp.Pool()
def trace(i):
# Rays are calculated here, for simplicity of example return a fixed color
pos = vec2_from_index(i)
col = rgb_from_hex("ff7f00")
screen.set_at((pos.x, pos.y), (col.r, col.g, col.b))
while True:
# Raytrace each pixel and draw the new color, 64 * 16 = 1024
pool.map(trace, range(0, 1024)):
clock.tick(30)
However this approach is bound to fail due to the way threading works: Threads can only return a modified result when the function ends, they can't edit variables from the outside directly in a way that will be seen by the main thread or other threads. Any changes done by the process are thus temporary and only exist in the reality of this thread before it finishes.
What do you see as the best solution here, in case anything better than my current approach is possible? Is there a way for threads to execute pygame.set_at
on the screen surface with permanent results? Also in this case I wouldn't need the thread pool to return a result... should I use something other than pool.map
for more efficiency?
I managed to find the perfect solution and would be happy to share it here as well! I will modify my initial example to show what I did roughly.
import multiprocessing as mp
import pygame as pg
import math
pg.init()
screen = pg.display.set_mode((64, 16))
clock = pg.time.Clock()
pool = mp.Pool()
threads = mp.cpu_count()
def draw_trace(i):
# Rays are calculated here, for simplicity of example return a fixed color
return rgb(255, 127, 0)
def draw(thread):
# Create a new surface and draw every pixel on it
srf = pg.Surface((64, math.ceil(16 / threads)))
for i in range(64 * math.ceil(16 / threads)):
pos = vec2_from_index(i)
col = draw_trace(i)
srf.set_at((pos.x, pos.y), (col.r, col.g, col.b))
return pg.image.tobytes(srf, "RGB")
while True:
# Raytrace each pixel and draw the new color, 64 * 16 = 1024
result = pool.map(draw, range(0, 1024)):
for i, s in enumerate(result):
srf = pg.image.frombytes(s, (64, math.ceil(16 / threads)), "RGB")
screen.blit(srf, (0, math.ceil(16 / threads) * i))
clock.tick(30)
This achieves exactly what I want, with one thread always working on its own vertical slice: Each thread creates its own surface and draws the pixels to it after obtaining their colors. This surface is then packed using tobytes
, send to the main thread by the thread pool, and unpacked with frombytes
... not doing this causes an error about something called "pickling" which I don't fully understand. The main thread then calculates which vertical tile belongs where and blits it to the main canvas to update it.
The performance improvement is very palpable: The pygame clock reports over 15 FPS where I previously had 10, in practice it feels like it's twice faster at least! This is likely the last major performance improvement I'll achieve architecturally but definitely makes the whole project even more usable. If anyone's interested in checking it out you can find my project on Github which now contains this solution: