Search code examples
assemblyneon

Assembly / Neon code crashing


I'm using the following code:

#include <stdlib.h>
#include <fcntl.h>

int main(int argc, char **argv) {
    char *auyvy = malloc(640 * 480 * 2);
    char *ay8 = malloc(640 * 480);

    int fd = open("input.uyvy", O_RDONLY);
    if (fd >= 0) {
        read(fd, auyvy, 640 * 480 * 2);
        close(fd);
    }

    __uyvy_luma_extract(640, 480, auyvy, 640 * 2, ay8, 640);

    fd = open("output.y8", O_RDWR | O_CREAT);
    if (fd >= 0) {
        write(fd, ay8, 640 * 480);
        close(fd);
    }
}

with the two additional files: https://github.com/emrainey/DVP/blob/master/libraries/public/yuv/__uyvy_luma_extract.S https://github.com/emrainey/DVP/blob/master/libraries/public/yuv/yuv.inc

I compile with "gcc -g convert.c __uyvy_luma_extract.S -mfpu=neon"

Strangely, the program crashes during the conversion. Any idea what I'm doing wrong?

* FIRST EDIT * I have uploaded a zip file with the various file so that it's easily reproducible on an ARM platform: http://www.gentil.com/tmp/convert.zip

* SECOND EDIT * I have updated the assembly file link which was not correct.

* THIRD EDIT * gdb gives the following:

Starting program: /home/ai/convert/convert                                      

Program received signal SIGSEGV, Segmentation fault.
0x00008036 in ?? ()
(gdb) bt
#0  0x00008036 in ?? ()
#1  0x000084f2 in __uyvy_luma_extract () at __uyvy_luma_extract.S:38
#2  0x000084f2 in __uyvy_luma_extract () at __uyvy_luma_extract.S:38
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Solution

  • Oh, this is a good one.

    It works fine if built with -marm but breaks with -mthumb. Ubuntu and Android probably have different defaults for this.

    The reason it breaks in Thumb mode is that the assembly function (which is always non-Thumb) is missing a type specification for the symbol, so the linker doesn't know it needs to use a BLX instruction to call it from Thumb code. When the program is executed, the assembly function is thus erroneously called in Thumb state. The first half-word of this function, 0x47ff, when interpreted as a Thumb instruction, is BLX pc which is invalid with unpredictable behaviour. Apparently, the Cortex cores simply execute it in the obvious way, that is switch to ARM state, branch to the PC value (current instruction + 4 in Thumb state), and store the next (Thumb) instruction address in LR, thus giving the appearance of having simply ignored the STM instruction.

    The fix is to add this line to the assembly file:

    .type __uyvy_luma_extract, STT_FUNC