Search code examples
masmssevisual-c++-2013

Returning a __m128d from MASM procedure to a C caller


I am porting a function from inline assembly to MASM in Visual Studio 2013 and am having trouble getting a return value out of it.

Here is the C caller and the assembly function prototype:

extern "C" void AbsMax(__m128d* samples, int len, __m128d* pResult);

__m128d AbsMax(__m128d* samples, int len)
{
    __m128d absMax = { 0, 0 };
    AbsMax(samples, len, &absMax);
    return absMax;
}

And the assembly function:

.686              ;Target processor.  Use instructions for Pentium class machines
.xmm

.model flat, c    ;Use the flat memory model. Use C calling conventions
.code             ;Indicates the start of a code segment.

AbsMax proc samples:PTR DWORD, len:DWORD, result:PTR XMMWORD
    ;; Load up registers. xmm0 is min, xmm1 is max. L is Ch0, H is Ch1.
    mov     ecx,  [len]
    shl     ecx,  4
    mov     esi,  [samples]
    lea     esi,  [esi+ecx]
    neg     ecx
    pxor    xmm0, xmm0
    pxor    xmm1, xmm1

ALIGN 16
_loop:
    movaps  xmm2, [esi+ecx]
    add     ecx,  16
    minpd   xmm0, xmm2
    maxpd   xmm1, xmm2
    jne     _loop

    ;; Store larger of -min and max for each channel. xmm2 is -min.
    pxor    xmm2, xmm2
    subpd   xmm2, xmm0
    maxpd   xmm1, xmm2
    movaps  [result], xmm1  ; <=== access violation here

    xor eax, eax
    xor ebx, ebx
    ret
AbsMax ENDP 
END 

As I understand the convention for MASM, return values are normally returned out through the EAX register. However, since I'm trying to return a 128-bit value I'm assuming an out parameter is the way to go. As you can see in the assembly listing, assigning the out parameter (movaps [result]) is causing an access violation (Access violation reading location 0x00000000). I've validated the address of result in the debugger and it looks fine.

What am I doing wrong?


Solution

  • For educational purposes, I wrote up a version of your function that uses intrinsics:

    #include <immintrin.h>
    
    extern "C" void AbsMax(__m128d* samples, int len, __m128d* pResult)
    {
        __m128d min = _mm_setzero_pd();
        __m128d max = _mm_setzero_pd();
        while (len--)
        {
            min = _mm_min_pd(min, *samples);
            max = _mm_max_pd(max, *samples);
            ++samples;
        }
        *pResult = _mm_max_pd(max, _mm_sub_pd(_mm_setzero_pd(), min));
    }
    

    Then I compiled using the VC++ x64 compiler using cl /c /O2 /FA absmax.cpp to generate an assembly listing (edited to remove line comments):

    ; Listing generated by Microsoft (R) Optimizing Compiler Version 18.00.31101.0 
    include listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC  AbsMax
    _TEXT   SEGMENT
    samples$ = 8
    len$ = 16
    pResult$ = 24
    AbsMax PROC                     ; COMDAT
        xorps   xmm3, xmm3
        movaps  xmm2, xmm3
        movaps  xmm1, xmm3
        test    edx, edx
        je  SHORT $LN6@AbsMax
        npad   3
    $LL2@AbsMax:
        minpd   xmm2, XMMWORD PTR [rcx]
        maxpd   xmm1, XMMWORD PTR [rcx]
        lea rcx, QWORD PTR [rcx+16]
        dec edx
        jne SHORT $LL2@AbsMax
    $LN6@AbsMax:
        subpd   xmm3, xmm2
        maxpd   xmm1, xmm3
        movaps  XMMWORD PTR [r8], xmm1
        ret 0
    AbsMax  ENDP
    _TEXT   ENDS
    END
    

    Noting that x64 uses a __fastcall convention by default, and shadows the parameters on the stack, I see that the out parameter is in fact being written indirectly through r8, which is the third integer parameter for x64 code, per MSDN. I think if your assembly code adopts this parameter convention, it will work.

    The shadowed stack space is not initialized with the actual parameter values; it's intended for callees, if they need a place to stash the values while using the registers. That's why you're getting a zero value dereference error in your code. There's a calling convention mismatch. The debugger knows about the calling convention, so it can show you the enregistered value for the parameter.