Search code examples
c++segmentation-faultssesimdintrinsics

how to debug a _mm_mul_ps function?


I've this code:

inline __m128 process(const __m128 *buffer) {
    __m128 crashTest;
    for (int i = 0; i < mFactor; i++) {
        crashTest = _mm_mul_ps(buffer[i], _mm_set1_ps((float)(((int32_t)1) << 16)));
    }

    return crashTest;
}

when I call it with some "buffer", it crash the application (i.e. Segmentation fault).

How can I debug it? To discover which value will cause the crash? Tried a try catch, but it doesn't catch the segmentation fault.

Can't "cout" the value, because i'm inside an heavy "audio" process (such as 44100 x n cout in a sec, which freeze the i/o).

Any tips?


Solution

  • _mm_mul_ps is not a function. It looks like one, but it compiles into a single instruction, depending on compiler settings either mulps or vmulps. The output is well defined over complete range of inputs, does the right thing even with weird values like INF, NAN or denormals.

    If that function crashes, the probable reason is memory access. Most likely out of bounds access to the buffer argument. Another possible reason is the argument not being 16-byte aligned, albeit that only crashes when compiling into mulps SSE instruction but not the vmulps AVX instruction. In both cases, no amount of printing gonna help: you'll simply move the crash from _mm_mul_ps into your vector printing function.

    If for some reason you can't use a debugger, #include <assert.h> and implement a few checks there.

    Checking for range is unreliable and platform-dependent, but still, you can use VirtualQuery API on Windows, and parse all these numbers from /proc/self/maps text file on Linux.

    Checking for alignment is trivial though, assert( 0 == ( ((size_t)buffer) % 16 ) );

    P.S. The best long-term solution however, add buffer size argument. Or supply another pointer for the end of the input buffer. Or replace the raw pointer with const std::vector<__m128>&. With all of these approaches, you'll be able to detect out of bounds access and fail gracefully with an exception instead of crashing the process with access violation.