I can't understand this Horner's rule implementation in extended fields GF(p^n)

I'm trying to understand the implementation of the Shamir's Secret Sharing Scheme from this (old) implementation on github, and I'm struggling with Horner's rule in extended fields GF(p^n):

void horner(int n, mpz_t y, const mpz_t x, const mpz_t coeff[])
{
  int i;
  mpz_set(y, x);
  for(i = n - 1; i; i--) {
    field_add(y, y, coeff[i]);
    field_mult(y, y, x);
  }
  field_add(y, y, coeff[0]);
}

Why does add come first and only then mult? What's the algorithm? Why not something like:

  mpz_set(y,coeff[n-1]);
  for(i = n - 2; i!=-1; i--) {
    field_mult(y, y, x);
    field_add(y,y,coeff[i]);
  }

Solution

Translating this horner function with normal addition and multiplication symbols, we get:

y = x;                     // mpz_set(y, x);
for(i = n - 1; i; i--) {
    y = y + coeff[i];      // field_add(y, y, coeff[i]);
    y = y * x              // field_mult(y, y, x);
}
y = y + coeff[0]           // field_add(y, y, coeff[0]);

Hence this computes the following:

You can see it does not compute any polynomial, but it is a variant of Horner's algorithm to compute a monic polynomial.

Now what you propose:

y = coeff[n-1];               // mpz_set(y,coeff[n-1]);
for(i = n - 2; i!=-1; i--) {
    y = y * x;                // field_mult(y, y, x);
    y = y + coeff[i];         // field_add(y,y,coeff[i]);
}

Thus you compute the following:

You can see the highest-order term is missing.

If you want to have all the operations inside the body of the loop, you can. After all, it's only two ways of decomposing a series of alternating instructions differently:

operation    value of y                                    loop iteration
                                                 add-mult loop      mult-add loop
                x                               initialization         n-1
add       x + coeff[n-1]                             n-1               n-1
mult     (x + coeff[n-1]) * x                        n-1               n-2
add      (x + coeff[n-1]) * x + coeff[n-2]           n-2               n-2
mult    ((x + coeff[n-1]) * x + coeff[n-2]) * x      n-2               n-3
          ...etc...

But you need to explicitly initialize y to the value 1 (which is the implicit coeff[n]) so that you can start by multiplying by x and get the correct highest-order term.

y = 1;                        // mpz_set(y,1);
for(i = n - 1; i!=-1; i--) {  // NOTICE n - 1 NOT n - 2
    y = y * x;                // field_mult(y, y, x);
    y = y + coeff[i];         // field_add(y,y,coeff[i]);
}

You can count that you now perform one more multiplication, and it is multiplying 1 * x. On a finite field this is typically done with log and antilog tables, so you might as well avoid such a useless multiplication, especially if you're going to evaluate polynomials a lot.

TL;DR: This way of writing Horner's algorithm puts the last addition and the first multiplication outside of the loop's body. Because the highest-order coefficient is 1 this multiplication is then completely removed.

To clarify: the highest-order term is kept, but is then x^n instead of being 1 * x^n. You spare one multiplication for the exact same result.