Search code examples
assemblyarmsignal-processingcortex-mblit

Fast 1bit transparency blit on ARM cortex m4 DSP code


I'd like to set bytes from a bitmap to memory, setting bytes which value is not equal to a given transparency byte value.

Schematically I'd like to do :

for (char *src=start;src<end;src++,dst++) 
{
    if (*src!=VALUE) {
       *dst=*src;
    }
}

i.e. setting only bytes that are different from a value, in C or assembly (or C back translated from assembly)

To be faster, I've considered using 32bits loads, the SEL operation between src and dst, and a 32bit store. However, I need to set the mask, which is in APSR.GE.

If i'm not wrong, doing a SIMD comparison (using USUB8) with VALUE will only check whether the result is >= or < to VALUE, it's not possible to check if they're equal. (of course you could restrict VALUE to 0 or 255 and call it a day ...)

Another possibility would be to use a precomputed mask on src and then setting manually APSR.GE (is it possible?) but 1) it uses memory, 2) it's not always feasible to have the data before 3) not sure if it will really be faster than a byte by byte access.


Solution

  • Exact syntax escapes me for now but how about something like this:

    • load four bytes from existing image into Ra (LDR)
    • load four bytes from source image into Rb (LDR)
    • XOR Ra with appropriate mask (~VALUE) to change VALUE to be 0 (EOR)
    • XOR Rb with same mask as above (EOR)
    • Do the USUB8 with a register with 0 in to set the GE flags (USUB8)
    • Use SEL to select between the existing image bytes and the source image bytes, write in Rc (SEL)
    • XOR Rc with mask again to restore original bytes (EOR)
    • Write Rc back into existing image (STR)