Assembly 8x8 four quadrant multiply algorithm

In the book "Musical Applications of Microprocessors," the author gives the following algorithm to do a 4 quadrant multiplication of two 8 bit signed integers with a 16 bit signed result:

Do an unsigned multiply on the raw operands. Then to correct the result, if the multiplicand sign is negative, unsigned single precision subtract the multiplier from the top 8 bits of the raw 16 bit result. If the multiplier sign is also negative, unsigned single precision subtract the multiplicand from the top 8 bits of the raw 16 bit result.

I tried implementing this in assembler and can't seem to get it to work. For example, if I unsigned multiply -2 times -2 the raw result in binary is B11111100.00000100. When I subtract B1111110 twice from the top 8 bits according to the algorithm, I get B11111110.00000100, not B00000000.00000100 as one would want. Thanks for any insight into where I might be going wrong!

Edit - code:

    #define smultfix(a,b)       \
    ({                      \
    int16_t sproduct;               \
    int8_t smultiplier = a, smultiplicand = b;  \
    uint16_t uproduct = umultfix(smultiplier,smultiplicand);\
    asm volatile (                  \
    "add %2, r1 \n\t"               \
    "brpl smult_"QUOTE(__LINE__)"\n\t"      \
    "sec                 \n\t"      \
    "sbc  %B3, %1            \n\t"      \
    "smult_"QUOTE(__LINE__)": add %1, r1 \n\t"  \
    "brpl send_"QUOTE(__LINE__)"  \n\t"     \
    "sec                 \n\t"      \
    "sbc  %B3, %2            \n\t"      \
    "send_"QUOTE(__LINE__)": movw %A0,%A3 \n\t" \
    :"=&r" (sproduct):"a" (smultiplier), "a" (smultiplicand), "a" (uproduct)\
    );                      \
    sproduct;                   \
    })

Solution

Edit: You got the subtraction wrong.

1111'1110b * 1111'1110b == 1111'1100'0000'0100b
                          -1111'1110'0000'0000b                   
                          -1111'1110'0000'0000b  
                          ---------------------
                                           100b

Otherwise your algorithm is correct: In the fourth quadrant, you need to subtract 100h multiplied with the sum (a+b). Writing the two-complement bytes as (100h-x) I get:

(100h-a)(100h-b) = 10000h - 100h*(a+b) + ab = 100h*(100h-a) + 100h*(100h-b) + ab mod 10000h
(100h-a)(100h-b) - 100h*(100h-a) - 100*(100h-b) = ab mod 10000h