Double performance a lot faster than floats in C

I was trying to figure out if using floats in some code in C would be precise enough for my needs, but after searching and not really understanding how bits of precision translated to actual numbers, I decided just to write a bit of code for my test case and see what the results were.

Floats seem precise enough, but I quite surprised that floats were taking about 70% longer to run on my 17 4700hq haswell processor (windows 8.1 x64, C, MSVS v120). I would have expected the running time to be similar or floats performing faster. But clearly not. So I turned off all optimizations, still the same. Tried it on the debug version, and still the same performance issues. AVX2 and SSE 3, all were showing this.

Doubles take about 197 seconds to run and floats 343 seconds.

I've glanced through the Intel® 64 and IA-32 Architectures Software Developer’s Manual, but considering its size and my lack of expertise, I've yet to glean any answers from it concerning this. Then I took a look at the disassembly of both, but I didn't notice any glaring differences to my untrained eyes.

So, anyone know why this is the case? Here's the code I've used, with the only changes being from doubles to floats for all but the anError variable.

#include <errno.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <time.h>
#include <sys/types.h>
#include <omp.h>



int main( void ) {

    clock_t start = clock() ;

    // body of a candle
    double open = 0.500000 ;
    double close = 0.500001 ;
    double end = 1 ;

    uint64_t resultCounter = 0 ;
    double anError = 0 ;

    while (open < end){
        while(close < end){
            //calc # times result is postive. Should be 0.
            double res = open - close ;
            if (res > 0 ) { 
                resultCounter++ ; 
                if (anError < fabs( res )) { anError = res ;    }
            }
            close = close + 0.000001 ;
        }
        open = open + 0.000001 ;
        close = open + .000001 ;
    }

    clock_t finish = clock() ;
    double duration = ((double) (finish - start)) / CLOCKS_PER_SEC;
    double iterations = (((end - .50000) / .000001) * ((end - .50000) / .000001)) ;
    fprintf( stdout, "\nTotal processing time was %f seconds.\n", duration ) ;
    fprintf( stdout, "Error is %f. Number of times results were incorrect %llu out of %f iterations.\n", 
        anError, resultCounter, iterations ) ;

    return 0  ;
}

EDIT: The lack of f at the end of the numbers seems to be the cause (thanks Joachim!). Apparently a float constant without the f suffix is actually a double! Another of C's quirks that likes to bite the ignorant in the butt. Not sure what the rationale behind this oddity is but shrug. If anyone wants to write up a good answer to this so I can accept it, feel free.

Solution

According to the C standard :

An unsuffixed floating constant has type double. If suffix is the letter f or F, the floating constant has type float. If suffix is the letter l or L, the floating constant has type long double

More details about floating point constants here. So :

num is just a float

float num = 1.0f;
float num = 1.0F;

a double gets converted to float and stored in num

float num = 1.0;

a float gets converted to double and stored in num

double num = 1.0f;
double num = 1.0F;

The performance is worse when using floats due to the conversion of the constant from double to float which involves copying memory.