I have a 64-tap FIR filter whose output format I am having trouble understanding. The filter has been implemented using (signed) fixed-point math. In {B,F} format, where B is the word length, and F is the fraction length, the filter inputs are {16,0}, and the coefficients are {16,17}. The heart of the filter is as follows:
for (i = 0 ; i < 32 ; i++) {
accumulator += coefficients[i] *
(input[(inputIndex + 64 - i) % 64] +
input[(inputIndex + 1 + i) % 64]);
}
Each iteration of the for loop produces an output whose format is given by:
{16,17} * ( {16,0} + {16,0} ) = {16,17} * {17,0}
= {33,17}
using the rules of fixed-point arithmetic. As there are 32 iterations, it is necessary to add 6 additional bits to the size of the accumulator to prevent overflow. The six bits come from using the (MATLAB) formula:
floor(log2(32)) + 1
as per this document. According to my reasoning, this should result in an output of format {39,17}. Why then does MATLAB report the filter output size as {34,17}? Furthermore, if I want the filter output to be the same format as the input, am I correct in thinking that I need to right-shift by (in the {39,17} case) 22 bits?
This looks fine:
{16,17} * ( {16,0} + {16,0} ) = {16,17} * {17,0}
= {33,17}
With 32 iterations, you can generate 5 additional bits (not 6), so it's {38,17}
. MATLAB's output couldn't be right for all possible inputs. Is it considering particular inputs or the general case?
The format of the input {16,0}
is an integer with no fraction. So to achieve the same scale as the input, you want to merely shift the fraction out, a right shift of 15. This truncates. Consider adding 0x4000 ~= 1/2 before shifting, a form of rounding.
If you actually want to match the input {16,0}
exactly, you shift right by 22 (possibly adding 0x200000 first to round). This introduces a scale factor of 1/128 in the transfer function (giving away about -20dB of signal!). Fine if that's what the problem demands.