Search code examples
tensorflowneural-networkcomputation

How does the amount of GFlops affect the training speed of neural network


If one gpu/cpu have twice as much GFlops then the other does that mean that the neural network on that device will train twice as fast ?


Solution

  • FLOP or floating point operations per second is a measure of performance, meaning how fast the computer can perform calculations. GFLOP is simply a Giga FLOP. So having GPU with 2 times higher GFLOP value is very likely to speed up the training process. However factor of 2 would be kind of upper-bound, because you will have other parts which do not depend on computing power, like memory speed, RAM or even other conditions like a cooling system of your GPU/CPU and other (yeah this can affect speed of calculations). And here you should ask what percent of the training time is actually taken by GPU/CPU calculations? if it's 80%, then you can speed up training significantly, if it is 20% then probably not. If you are sure that most of the time is taken by GPU calculations, next what you should now is what affects FLOP amount:

    1. Number of cores. If the system has more cores it has more FLOPs (more parallel computations), but this will help only in case your code is very parallelizable and GPU with, let's say twice less cores, was not enough to perform all those operations at once. So if that's the case and now you use 2 times more parallel calculations, then training speed - decreases. This is more applicable to large convolutional networks, but not as efficient for fully connected or recurrent.
    2. Core frequency. If the GPU has higher frequency of cores - it can calculate faster. This part is very important and if your GPU has higher frequency, then the training will speed up for any type of neural network.
    3. Architecture. You probably heard of different GPU architectures like Pascal, Tesla and others. So this part can affect number of instruction performed in a single cycle. In other words, how many instructions are performed in one processor cycle and we have 'frequency' of this cycles in a second. So if an architecture results in twice more FLOPs then it will also highly likely to reduce training time similar to the previous paragraph.

    Thus it is hard to say how much you will gain from higher amount of FLOPs. If you use two gpus then you will increase FLOPs by 2 similar to paragraph 1. Using two GPUs will also increase GPU memory and it is helpful if single GPU had not enough and the code had to read data from memory frequently.

    Thus, effect of FLOPs on the training speed is quite complex, so it will depend on a lot of factors like how parallel is your network, how achieved higher amount of FLOPs, memory usage and other.