How to perform division of two Q15 values in Verilog , with out using '/' (division) Operator?

As division operation (/) is expensive in case of FPGA ? Is it possible to perform division of two Q15 format numbers(16 bit fixed point number) with basic shift operations?

Could someone help me by providing some example?

Thanks in advance!

Solution

Fixed-point arithmetic is just integer arithmetic with a bit of scaling thrown in. Q15 is a purely fractional format stored as a signed 16-bit integer with scale factor of 2¹⁵, able to represent values in the interval [-1, 1). Clearly, division only makes sense in Q15 when the divisor's magnitude exceeds the dividend's magnitude, as otherwise the quotient's magnitude exceeds the representable range.

Before embarking on a custom Verilog implementation of fixed-point division, you would want to check your FPGA vendor's library offerings as a fixed-point library including pipeline division is often available. There are also opens source projects that may be relevant, such as this one.

When using integer division operators for fixed-point division, we need to adjust for the fact that the division will remove the scale factor, i.e (a * 2^scale) / (b * 2^scale) = (a/b), while the correct fixed-point result is (a/b * 2^scale). This is easily fixed by pre-multiplying the dividend by 2^scale, as in the following C implementation:

int16_t div_q15 (int16_t dividend, int16_t divisor)
{
     return (int16_t)(((int32_t)dividend << 15) / (int32_t)divisor);
}

Wikipedia gives a reasonable overwiew on how to implement binary division on a bit-by-bit basis using add, subtract, and shift operations. These methods are closely related to the longhand division taught in grade school. For FPGAs, the use of the non-restoring method if often preferred, as pointed out by this paper, for example:

Nikolay Sorokin, "Implementation of high-speed fixed-point dividers on FPGA". Journal of Computer Science & Technology, Vol. 6, No. 1, April 2006, pp. 8-11.

Here is C code that shows how the non-restoring method may be used for the division of 16-bit two's-complement operands:

/* bit-wise non-restoring two's complement division */
void int16_div (int16_t dividend, int16_t divisor, int16_t *quot, int16_t *rem)
{
    const int operand_bits = (int) (sizeof (int16_t) * CHAR_BIT);
    uint16_t d = (uint16_t)divisor;
    uint16_t nd = 0 - d; /* -divisor */
    uint16_t r, q = 0; /* remainder, quotient */
    uint32_t dd = (uint32_t)d << operand_bits; /* expanded divisor */
    uint32_t pp = dividend; /* partial remainder */
    int i;

    for (i = operand_bits - 1; i >= 0; i--) {
        if ((int32_t)(pp ^ dd) < 0) {
            q = (q << 1) + 0;   /* record quotient bit -1 (as 0) */
            pp = (pp << 1) + dd;
        } else {
            q = (q << 1) + 1;   /* record quotient bit +1 (as 1) */
            pp = (pp << 1) - dd;
        }
    }
    /* convert quotient from digit set {-1,1} to plain two's complement */
    q = (q << 1) + 1;

    /* remainder is upper half of partial remainder */
    r = (uint16_t)(pp >> operand_bits);

    /* fix up cases where we worked past a partial remainder of zero */
    if (r == d) { /* remainder equal to divisor */
        q = q + 1;
        r = 0;
    } else if (r == nd) { /* remainder equal to -divisor */
        q = q - 1;
        r = 0;
    }

    /* for truncating division, remainder must have same sign as dividend */
    if (r && ((int16_t)(dividend ^ r) < 0)) {
        if ((int16_t)q < 0) {
            q = q + 1;
            r = r - d;
        } else {
            q = q - 1;
            r = r + d;
        }
    }
    *quot = (int16_t)q;
    *rem  = (int16_t)r;
}

Note that there are multiple ways of dealing with the various special cases that arise in non-restoring division. For example, one frequently sees code that detects a zero partial remainder pp and exits the loop over the quotient bits early in this case. Here I assume that an FPGA implementation would unroll the loop completely to create a pipelined implementation, in which case early termination is not helpful. Instead, a final correction is applied to those quotients that are affected by ignoring a partial remainder of zero.

In order to create a Q15 division from the above, we have to make just a single change: incorporating the up-scaling of the dividend. Instead of:

uint32_t pp = dividend; /* partial remainder */

we now use this:

uint32_t pp = dividend << 15; /* partial remainder; incorporate Q15 scaling */

The resulting C code (sorry, I won't provide read-to-use Verilog code) including the test framework is:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <limits.h>
#include <math.h>

/* bit-wise non-restoring two's complement division */
void q15_div (int16_t dividend, int16_t divisor, int16_t *quot, int16_t *rem)
{
    const int operand_bits = (int) (sizeof (int16_t) * CHAR_BIT);
    uint16_t d = (uint16_t)divisor;
    uint16_t nd = 0 - d; /* -divisor */
    uint16_t r, q = 0; /* remainder, quotient */
    uint32_t dd = (uint32_t)d << operand_bits; /* expanded divisor */
    uint32_t pp = dividend << 15; /* partial remainder, incorporate Q15 scaling */
    int i;

    for (i = operand_bits - 1; i >= 0; i--) {
        if ((int32_t)(pp ^ dd) < 0) {
            q = (q << 1) + 0;   /* record quotient bit -1 (as 0) */
            pp = (pp << 1) + dd;
        } else {
            q = (q << 1) + 1;   /* record quotient bit +1 (as 1) */
            pp = (pp << 1) - dd;
        }
    }
    /* convert quotient from digit set {-1,1} to plain two's complement */
    q = (q << 1) + 1;

    /* remainder is upper half of partial remainder */
    r = (uint16_t)(pp >> operand_bits);

    /* fix up cases where we worked past a partial remainder of zero */
    if (r == d) { /* remainder equal to divisor */
        q = q + 1;
        r = 0;
    } else if (r == nd) { /* remainder equal to -divisor */
        q = q - 1;
        r = 0;
    }

    /* for truncating division, remainder must have same sign as dividend */
    if (r && ((int16_t)(dividend ^ r) < 0)) {
        if ((int16_t)q < 0) {
            q = q + 1;
            r = r - d;
        } else {
            q = q - 1;
            r = r + d;
        }
    }
    *quot = (int16_t)q;
    *rem  = (int16_t)r;
}

int main (void)
{
    uint16_t dividend, divisor, ref_q, res_q, res_r;
    double quot, fxscale = (1 << 15);

    dividend = 0;
    do {
        printf ("\r%04x", dividend);
        divisor = 1;
        do {
            quot = trunc (fxscale * (int16_t)dividend / (int16_t)divisor);
            /* Q15 can only represent numbers in [-1, 1) */
            if ((quot >= -1.0) && (quot < 1.0)) {
                ref_q = (int16_t)((((int32_t)(int16_t)dividend) << 15) / 
                                  ((int32_t)(int16_t)divisor));
                q15_div ((int16_t)dividend, (int16_t)divisor, 
                         (int16_t *)&res_q, (int16_t *)&res_r);
                if (res_q != ref_q) {
                    printf ("!r dividend=%04x (%f) divisor=%04x (%f)  res=%04x (%f)  ref=%04x (%f)\n", 
                            dividend, (int16_t)dividend / fxscale, 
                            divisor, (int16_t)divisor / fxscale, 
                            res_q, (int16_t)res_q / fxscale, 
                            ref_q, (int16_t)ref_q / fxscale);
                }
            }
            divisor++;
        } while (divisor);
        dividend++;
    } while (dividend);

    return EXIT_SUCCESS;
}