Search code examples
multithreadinghardwaregpgpu

What is the purpose of many-core CPU's when we have graphics cards?


My understanding is that for a problem to benefit from multiple cores, it needs to be possible to split it into many subtasks, which do not depend on each other.

But if an algorithm can be split into 8, 16, or 64 sub-tasks to run on a multicore CPU, what's stopping you from splitting it up further and running it on the graphics card? Wouldn't that be even faster?

What does a many-core CPU do well that a GPU cannot?


Solution

  • Here are two problems I've been working on:

    A: You have a triangle in 3-space with 64,000 dots (xyz) in it. For each dot, compute the distance from the dot up or down to the plane of the triangle and the amount moving each corner of the triangle up or down would affect the distance. (PerfectTIN)

    B: You have 6542 prime numbers; for each you want to compute a permutation of that many numbers (e.g. for 7 you compute a permutation of 0,1,2,3,4,5,6). To compute the permutation for the prime p, you have to split p-2 into two smaller numbers, factorize them, look up the permutations for their factors, and interleave them in a certain way. (Quadlods)

    Problem A is well-suited to a GPU, as well as a many-core CPU. (I currently run it on a 12-thread CPU but haven't coded it for GPU yet.) Every GPU core runs the exact same computation, the only difference being the xyz coordinates of the dot. There are no branches, and the loop is run the same number of times for each dot.

    Problem B can be run on a many-core CPU, but is not well-suited to a GPU. The smaller numbers have different numbers of factors, so each core has to run through a loop a different number of times.

    In a GPU, each group of cores runs the same sequence of instructions (not just the same code) on different data. In a multicore CPU, each core can be running the same code, but different sequences of instructions, because of different paths through branches and loops. Cores can also run different code; e.g. one thread reads data from a file into a buffer, while another thread organizes them into some structure.