algorithm time-complexity raytracing raycasting

Reducing the O(n²) Time Complexity of a Ray Casting algorithm

I've written a ray casting algorithm based on standard algorithms. The point of intersection is calculated using the Möller-Trumbore algorithm (which reduced the execution time by about 350 % compared to a more simplistic algorithm).

Overall the following steps are executed:

For each triangle, shoot a ray from the light source to the triangle
Check if there are other triangles that intersect with the ray. Get the one with the minimal distance to the origin of the ray (i.e. the light source).
Illuminate the one with the minimal distance (set shaded = false for the triangle)

I do not need different variations of shading; the triangles only need to hold the information if they are shaded at all or not (boolean).

The problem is that for step 2, I need to do the intersection check for all triangles in the scene. In other words, the time complexity is O(n²). However, I've read that it is possible to have a ray casting algorithm with the time complexity O(log n).

I have some ideas to reduce the execution time. For example I could exclude all triangles with a larger distance to the light source than the one the ray is shot at from the calculation which might reduce the execution time by 50 %. But the complexity will still be O(n²) and it won't help much when handling larger amounts of date.

For example, using the ray tracing algorithm on a scene with 100.000 is still possible, but takes about ~10 minutes and that amount will exponentially increase when the scene consists of more triangles.

Is there a way to reduce the time complexity to a lower complexity class without fundamentally changing the way the algorithm works?

Edit: I've implemented a version of Boundary-Volume Hierarchies (BVH) @meowgoesthedog suggested. The triangle-box intersection was a bit tricky to implement, but other than that the theory behind it is rather easy to understand.

I've played around with different numbers of partitions and subpartitions and the results differ greatly, but in most cases the ray casting performs significantly better. There is no universal optimal configuration, so it makes sense to try different numbers for different objects/scenes. In my case, the 4/2 (dividing the room into 4*4*4 boundary boxes containing 2*2*2 subboxes each, i.e. 64 boxes with 8 inner boxes each), 5/2 and 6/2 generally perform well, though for some objects the non-hierarchical partitioning works best (e.g. 10/0).

The amount of required ray-triangle intersection tests can be reduced by up to 97% (maybe more), but higher levels of partition make the creation of the boundary boxes / AABBs pretty costly. With good configuration the program performs up to 4 times faster than the solution without boundary volumes. The better performance is better visible on scenes with a large amount of triangles (more than 10000).

However, my implementation is still relatively naïve and I'm sure there is still much more room for improvement. I'll continue tinkering and update this post if I get good results!

Solution

It depends on what you mean by "fundamentally changing the way" it works. If you mean not changing its behavior, i.e. it's output result and accuracy, then yes.

The way to do so is to use a spatial-hierarchy data structure; this will reduce the search space exponentially, giving you a logarithmic time complexity. Three of the most commonly used such structures are: 1. Octrees, 2. Boundary-Volume Hierarchies (BVH), and 3. KD-trees.

Octrees are very easy to construct, but not memory efficient or as performant as the others. KD-trees are difficult to construct well, but much more memory efficient and give the fastest intersection times. BVH is... somewhere in-between.

For KD-trees, this is a good starting point; this document is well-known in the ray tracing community, and is very good for further research.

(Another structure is the famous Binary-Space Partition (BSP). This gives better performance than all three above. However, constructing an optimal BSP tree is too costly for one-off ray tracing renders.)

Just to give you an idea of the potential gains from even a simple implementation, I used a KD-tree in my own ray-tracing project. At 1920x1080 with a 100K triangle model, simple Lambertian shading and 100 samples per pixel, a render took only 7 seconds. I tried the naive O(n) algorithm with only 1 sample per pixel and 320x240 resolution, and it took 10 minutes.