Search code examples
armsystemhardwarenumerical-methodsscientific-computing

ARM vs x86 for floating point


I apologize if I'm asking something very obvious.

Assume you are designing a piece of software which is heavy in floating point computation and you get to buy your own hardware. Assume that you rule out FPGAs and GPUs for reasons of flexibility and ease of code maintenance.

Assume further you have a decent level of parallelism in the software.

For a long time, that meant you were stuck with x86.

I am looking for an objective benchmark that would tell whether modern ARM CPUs are in the same ballpark. Maybe I'm searching wrong, but I find it very difficult to locate a trustworthy benchmark (something like LAPACK or maybe some physical simulation). I understand performance is obviously task dependent and that compiler optimizations will probably currently be better of x86, but at this stage I'm really looking to compare orders of magnitude.

Also, I find it strange that you can't really buy something along the lines of a raspberry PI, but with 8-64 modern cores comparable to the newest smartphones (like the newest Snapdragons) connected to a single bus. Do correct me if I'm mistaken, but such solutions may one day overtake GPUs in the FLOPS/$ category in addition to being more flexible.


Solution

  • Below are my Linpack Benchmark results for PCs via Linux, Raspberry Pi and Android devices (I have lots more via Windows). These are based on my C/C++ 1996 conversion for PCs that was approved by Jack Dongarra, the original author, and obtainable via.

    http://www.netlib.no/netlib/benchmark/linpack-pc.c

    This is for a matrix of order 100, in double precision. Results below include some at single precision. Dongarra’s historic results for this and supercomputer varieties are in:

    http://netlib.org/benchmark/performance.pdf

    This is just one benchmark and others give a different story. You can obtain lots more from my site including source codes and MP varieties, (Free with no ads):

    http://www.roylongbottom.org.uk/

    Linux 32/64 Bit Results
    
    Double Precision 100x100 compiled at 32 and 64 bits 
    
                                       Opt    No opt
    CPU                      MHz    MFLOPS    MFLOPS
    
    Atom N455     32b  Ub   1666       196        94
    Atom N455     64b  Ub   1666       226        89
    
    Core 2 Mob    32b  Ub   1830       983       307
    
    Athlon 64     32b  Ub   2211       936       231
    Athlon 64     64b  Ub   2211      1118       221
    
    Core 2 Duo    32b  Ub   2400      1288       404
    Core 2 Duo    64b  Ub   2400      1577       378
    
    Phenom II     32b  Ub   3000      1464       411
    Phenom II     64b  Ub   3000      1887       411
    Phenom II     64b  Fe   3000      1872       407
    
    Core i7 930   64b  Ub   ****      2265       511
    
    Core i7 4820K 32b  Ub   $$$1      2534       988
    Core i7 4820K 64b  Ub   $$$1      3672       900
    Core i7 4820K AVX  Ub   $$$12     5413       935
    
      Ub = Ubuntu Linux,   Fe = Fedora Linux        
     ****  Rated as 2800 MHz but running at up to   
           3066 MHz using Turbo Boost               
     $$$1  Rated as 3700 MHz but running at up to   
           3900 MHz, using Turbo Boost              
     $$$12 As $$$1, but compiled with GCC 4.8.2 that
           produces AVX SIMD insructions.               
    

    ######################################################

          Android and Raspberry Pi Versions
    
    Double Precision and Single Precision (SP) 100x100
    
                                   v7/v5       v5 
    CPU          MHz   Android    MFLOPS    MFLOPS
    
    ARM 926EJ    800       2.2       5.7       5.6
    ARM v7-A8    800     2.3.5      80.2          
    ARM v7-A9    800     2.3.4     101.4      10.6
    ARM v7-A9   1300a    4.1.2     151.1      17.1
    ARM v7-A9   1500     4.0.3     171.4          
    ARM v7-A9   1500a    4.0.3     155.5      16.9
    ARM v7-A9   1400     4.0.4     184.4      19.9
    ARM v7-A9   1600     4.0.3     196.5          
    ARM v7-A15  2000b    4.2.2     459.2      28.8
    
                                   v7 SP     Java 
    CPU          MHz   Android    MFLOPS    MFLOPS
    
    ARM 926EJ    800       2.2       9.6       2.3
    ARM v7-A9    800     2.3.4     129.1      33.3
    ARM v7-A9   1300a    4.1.2     201.3      56.4
    ARM v7-A9   1500a    4.0.3     204.6      56.9
    ARM v7-A9   1400     4.0.4     235.5      57.0
    ARM v7-A15  2000b    4.2.2     803.0     143.1
    
    
    Atom   Ax86 1666     2.2.1                15.7
    Core 2 Ax86 2400     2.2.1                53.3
    
    Raspberry Pi                    DP        SP  
    CPU          MHz     Linux    MFLOPS    MFLOPS
    
    ARM  1176    700     3.6.11     42        58  
    ARM  1176   1000     3.6.11     68        88  
    
                                  NEON SP         
    CPU          MHz   Android    MFLOPS          
    
    ARM v7-A9    800     2.3.4     255.8          
    ARM v7-A9   1300a    4.1.2     376.0          
    ARM v7-A9   1500a    4.0.3     382.5          
    ARM v7-A9   1400     4.0.4     454.2          
    ARM v7-A15  2000b    4.2.2    1334.9