Search code examples
pythonsympysymbolic-mathlinear-equation

sympy - symbolic sum over symbolic number of elements


What is the most appropriate way to express the following in SymPy:

A sum over samples 'x[i]' with 'i' going from concrete 0 to symbolic 'N'. 'x[i]' itself shall be symbolic, i.e. always appear as variable.

The goal is to use these expressions in a system of linear equations.

Example (trivial least squares approx.):

Given a set of samples (x[i], y[i]) which are supposed to lie on a line given by 'y = m*x + a'. That is, the estimated line is determined by 'm' and 'a'. The error between the samples and the estimated line may be given by

 error(m, a) = sum((m * x[i] + a - y[i]) ** 2, start_i=0, end_i=N)

Now, searching for the zero transitions in the derivatives 'd/dm error(m,a)' and 'd/da error(m,a)' delivers the minimal distance. How could I find the solution with sympy?


Solution

  • Given your later question, I assume you already figured most of it, but for clarity sake, samples are considered as function (makes sense, given sets are actually functions that cover the domain of the set [mostly over part of the integers]), so the notation is like x(i), and summation can be achieved with the summation function or Sum constructor (the first one is better, since it will expand automatically constant addends, like summation(x, (i, 0, n))).

    >>> from sympy import *
    >>> m, a, x, y, i, n = symbols('m a x y i n')
    >>> err = summation((m * x(i) + a - y(i)) ** 2, (i, 0, n))
    >>> pprint(err)
      n
     ___
     ╲
      ╲                      2
      ╱   (a + m⋅x(i) - y(i))
     ╱
     ‾‾‾
    i = 0
    

    After you provide the sum function the addend expression and the (index, lower bound, upper bound), you can move on to play with the sum:

    >>> diff(err, m)
    Sum(2*(a + m*x(i) - y(i))*x(i), (i, 0, n))
    >>> diff(err, a)
    Sum(2*a + 2*m*x(i) - 2*y(i), (i, 0, n))