I'm trying to implement PBD algorithm in OpenCL. In the original paper the algorithm is stated as
(1) forall vertices i
(2) initialize x[i]=x0[i], v[i]=v0[i], w[i]=1/m[i]
(3)endfor
(4)loop
(5) forall vertices i do v[i]←v[i]+∆t*w*i*f_ext(x[i])
(6) dampVelocities(v1,...,vN)
(7) forall vertices i do p[i]←x[i]+∆t*v[i]
(8) forall vertices i do generateCollisionConstraints(xi→pi)
(9) loop solverIterations times
(10) projectConstraints(C1,...,CM+Mcoll,p1,...,pN)
(11) endloop
(12) forall vertices i
(13) v[i]←(p[i]−x[i])/∆t(14)x[i]←p[i]
(15) endfor
(16) velocityUpdate(v1,...,vN)
(17)endloop
The step 8 is the most difficulut to efficiently parallelize. What I'm doing currently is to use a regular 3D grid (represented as a single buffer) where I first insert all the particles. Next I iterate neighboring cells and check for collisions. But what do I do when the collision is actually detected? I cannot hold a variable size vector, nor I can "append" constrains to a buffer because that would require atomic operations, which are not parallel. What are some efficient approaches for implementing PBD on GPU?
What I figured out so far is that probably the best approach to implement this is either
atomicAdd
to simulate stack, similarly to the way it's done in face culling https://vkguide.dev/docs/gpudriven/compute_culling/Stack 1: |_|_| (empty stack)
Stack 2: |A|_| (stack with one element)
Stack 3: |B|C| (stack fully filled up to its maximum length)
Then concatenate these stacks like so
|A|_|_|_|B|C|
and now iteratively count how much each element should be shifted, which is equal to the number of vacant elements to its left (inclusive). Create a second buffer that holds the number of those vacant elements. The initial state of this buffer is
|0|1|1|1|0|0| (each cell is 0 if it is empty or 1 otherwise)
In the second iteration you sum values of cells at positions i
and i+1
and store the result in i+1
.
|0|1|2|2|1|0| (each cell holds number of filled cells up to 2 places to the left, including itself)
In the third iteration you sum up values of cells i
and i+2
|0|1|2|3|3|2| (each cell holds number of filled cells up to 4 places to the left, including itself)
In the fourth iteration you sum up values of cells i
and i+4
|0|1|2|3|3|3| (each cell holds number of filled cells up to 8 places to the left, including itself)
Now you know that A
should be shifted by 0
to the left, B
by 3
and C
by 3
, arriving at final result
|A|B|C|_|_|_|