performance tensorflow deep-learning hpc parallelism-amdahl

Should I change my original Python/Tensorflow code to make it run faster on HPC?

Please forgive if this question is too basic. I am neither familiar with the idea of parallelization nor used a HPC system before.

I am training a deep learning model which takes really long on my PC. It takes approximately 2 days on my i5 with 12 GB RAM.

So I decided to use HPC but in one of the tutorials I watched, it says that if I do not write my code properly HPC will not be any faster than a regular PC. What is it really meant? Should I adjust my original code so that I can benefit HPC?

Secondly, can we say that using 30 cores should be 5 times faster than using 6 cores? Is speed and number of cores proportionate?

Solution

Q : "can we say that using 30 cores should be 5 times faster than using 6 cores?"

No, we can not.

Q : "Is speed and number of cores proportionate?"

No, it is not.

There is an ultimate ceiling for any (potential) speedup. The Amdahl's Law ( even in its original, overhead-naive, atomicity-of-work ignoring formulation ).

Better use the revised, overhead-strict, resources-aware Amdahl's Law re-formulation.

There you see.

In a seek for improving performance?

Start with this, best spending some time with tuning the core-parameters in the INTERACTIVE TOOL ( URL there ).

A conversion of a classical library (like TF or other ) into an HPC-efficient tool is not easy and does not come free - add-on overhead costs may easily (ref. the results in the INTERACTIVE TOOL) devastate any potential HPC-powers, just due to a poor scaling (going from costs in the range of a few ns to costs above a few ms is killing the game at whatever HPC-budget you may spend, isn't it? )