Search code examples
winapisseicc

Win64 SSE code compiled for Win32 produces incorrect performance counter result


I have a correctly running SSE code that I usually compile for Win64 (I use Intel C++ Compiler 14). This code (consists of SSE intrinsics) after completion also performs a performance counting operation. And I have an issue with this operation when I compile the same code for Win32.

The operation is simple:

LARGE_INTEGER Count;
QueryPerformanceCounter( &Count );
uint64_t v = Count.QuadPart;
printf( "%llu\n", v );
printf( "%f\n", (double) v ); 

The first printf prints a correct 64-bit value. Second printf produces -1.#IND00.

If I assign v manually, the error goes away.

The code was checked against possible buffer under/overrun and uninitialized accesses. No idea what's wrong. No such error on Win64.

Compiler produces the following code: on that block:

;;; LARGE_INTEGER Count;
;;; QueryPerformanceCounter( &Count );
    lea       eax, DWORD PTR [1408+esp]                     ;152.1
    push      eax                                           ;152.1
    call      DWORD PTR [__imp__QueryPerformanceCounter@4]  ;152.1
                            ; LOE ebx esi
.B1.94:                     ; Preds .B1.93

;;; uint64_t v = Count.QuadPart;
    mov       eax, DWORD PTR [1408+esp]                     ;153.14
    mov       edi, DWORD PTR [1412+esp]                     ;153.14
    mov       DWORD PTR [24+esp], eax                       ;153.14

;;; printf( "%llu\n", v );
    push      edi                                           ;154.1
    push      eax                                           ;154.1
    push      OFFSET FLAT: ??_C@_05A@?$CFllu?6?$AA@         ;154.1
    call      _printf                                       ;154.1
                            ; LOE ebx esi edi
.B1.344:                    ; Preds .B1.94
    add       esp, 12                                       ;154.1
                            ; LOE ebx esi edi
.B1.95:                     ; Preds .B1.344

;;; printf( "%f\n", (double) v ); 
    mov       DWORD PTR [esp], OFFSET FLAT: ??_C@_03A@?$CFf?6?$AA@ ;155.1
    mov       eax, DWORD PTR [24+esp]                       ;155.1
    mov       DWORD PTR [32+esp], eax                       ;155.1
    mov       DWORD PTR [36+esp], edi                       ;155.1
    fild      QWORD PTR [32+esp]                            ;155.1
    shr       edi, 31                                       ;155.1
    fadd      QWORD PTR [_2il0floatpacket.1575+edi*8]       ;155.1
    fstp      QWORD PTR [4+esp]                             ;155.1
    call      _printf                                       ;155.1

However, if I duplicate this part after second printf:

QueryPerformanceCounter( &Count );
v = Count.QuadPart;
printf( "%f\n", (double) v );

The printf prints a correct value. The assembler code is a bit different though:

;;; QueryPerformanceCounter( &Count );
    lea       eax, DWORD PTR [1408+esp]                     ;156.1
    push      eax                                           ;156.1
    call      DWORD PTR [__imp__QueryPerformanceCounter@4]  ;156.1
                            ; LOE ebx esi
.B1.97:                         ; Preds .B1.96

;;; v = Count.QuadPart;
;;; printf( "%f\n", (double) v );
    fild      QWORD PTR [1408+esp]                          ;158.1
    mov       eax, DWORD PTR [1412+esp]                     ;158.1
    shr       eax, 31                                       ;158.1
    mov       DWORD PTR [esp], OFFSET FLAT: ??_C@_03A@?$CFf?6?$AA@ ;158.1
    fadd      QWORD PTR [_2il0floatpacket.1575+eax*8]       ;158.1
    fstp      QWORD PTR [4+esp]                             ;158.1
    call      _printf                                       ;158.1

Solution

  • Found a solution: after performing SSE computations the _mm_empty() function should be called.