How to speed up Numpy sum and Python for-loop?

I have 2 codes that do almost the same thing.

code 1:

from __future__ import division
import numpy as np

m = 1
gamma = 1
lam = 1
alpha = 1
step_num = 2 ** 16
dt = 0.02


def E_and_x(x0):
    xi = x0
    vi = 0
    f = 0
    xsum = 0
    Ei, xavg = 0, 0
    for i in range(step_num):
        vi += f / m * dt / 2
        xi += vi * dt
        f = - gamma * xi - lam * xi ** 2 - alpha * xi ** 3
        vi += f / m * dt / 2
        Ei = 1 / 2 * m * vi ** 2 + 1 / 2 * gamma * xi ** 2 + \
            1 / 3 * lam * xi ** 3 + 1 / 4 * alpha * xi ** 4
        xsum += xi
        xavg = xsum / (i + 1)
    return Ei, xavg

E, x = [], []
for x0 in np.linspace(0, 1, 40):
    mdresult = E_and_x(x0)
    E.append(mdresult[0])
    x.append(mdresult[1])

code 2:

from __future__ import division
import numpy as np
from numba import jit

time = 50
niter = 2 ** 16  # number of iterations
t = np.linspace(0, time, num=niter, endpoint=True)


class MolecularDynamics(object):
    def __init__(self, time, niter, initial_pos):
        self.position = np.array([])
        self.velocity = np.array([])
        self.energy = np.array([])
        self.x_average = np.array([])
        self.vel = 0  # intermediate variable
        self.force = 0  # intermediate variable
        self.e = 0  # intermediate energy
        self.initial_pos = initial_pos  # initial position
        self.pos = self.initial_pos
        self.time = time
        self.niter = niter
        self.time_step = self.time / self.niter
        self.mass = 1
        self.k = 1  # stiffness coefficient
        self.lamb = 1  # lambda
        self.alpha = 1  # quartic coefficient

    @jit
    def iter(self):
        for i in np.arange(niter):
            # step 1 of leap frog
            self.vel += self.time_step / 2.0 * self.force / self.mass
            self.pos += self.time_step * self.vel
            # step 2 of leap frog
            self.force = - self.k * self.pos - self.lamb * self.pos ** 2 - self.alpha * self.pos ** 3
            self.vel += self.time_step / 2.0 * self.force / self.mass

            # calculate energy
            self.e = 1 / 2 * self.mass * self.vel ** 2 + \
                     1 / 2 * self.k * self.pos ** 2 + \
                     1 / 3 * self.lamb * self.pos ** 3 + \
                     1 / 4 * self.alpha * self.pos ** 4

            self.velocity = np.append(self.velocity, [self.vel])  # record vel after 1 time step
            self.position = np.append(self.position, self.pos)  # record pos after 1 time step
            self.energy = np.append(self.energy, [self.e])  # record e after 1 time step
            self.x_average = np.append(self.x_average, np.sum(self.position) / (i + 1))


mds = [MolecularDynamics(time, niter, xx) for xx in np.linspace(0, 1, num=40)]
[md.iter() for md in mds]  # loop to change value
mds_x_avg = [md.x_average[-1] for md in mds]
mds_e = [md.e for md in mds]

Well, the main difference is code 2 uses OO, Numpy and JIT. However, code 2 is much much slower than code 1 (it takes many minutes to compute).

In [1]: %timeit code_1
10000000 loops, best of 3: 25.7 ns per loop

By profiling I know the bottleneck is iter() function and more specifically, on append and sum. But using Numpy is as far as I can do, I wonder why code 2 is much more slower and how can I speed it up?

Solution

In addition to what MSeifert said, you can preallocate the arrays to the correct size instead of appending to them. So instead of creating them like this:

self.position = np.array([])  # No

you would write:

self.position = np.zeros(niter) # Yes

Then instead of appending like this:

self.velocity = np.append(self.velocity, [self.vel])

you would fill in like this:

self.velocity[i] = self.vel

This avoids reallocating the arrays at each iteration (You can do exactly the same with raw python lists BTW using array = [someValue]*size).

Vectorizability

I went on wondering about vectorizability of the OP's algorithm. It seems that it isn't vectorizable. To quote the Cornell Virtual Workshop :

Read after write ("flow") dependency. This kind of dependency is not vectorizable. It occurs when the values of variables in a particular loop iteration (the "read") are determined by a previous loop iteration (the "write").

The "flow" dependency is seen in the loop, where several members' values are determined by the previous state of that same member. For example:

self.vel += self.time_step / 2.0 * self.force / self.mass

Here, self.force is from the previous iteration and was computed from the previous self.vel.