Search code examples
pythonperformancemean-square-error

python fast mean squared error between two large 2d lists


I want to calculate the mse between two very large 2d arrays.

x1 = [1,2,3]
x2 = [1,3,5]
x3 = [1,5,9]
x = [x1,x2,x3]
y1 = [2,3,4]
y2 = [3,4,5]
y3 = [4,5,6]
y = [y1,y2,y3]

expected result is a vector of size 3:

[mse(x1,y1), mse(x2,y2), mse(x3,y3)]

As for now, I am using sklearn.metrics.mean_squared_error as such:

mses = list(map(mean_squared_error, x, y))

This takes extremely long time, as the real lengths of xi and yi are 115 and I have over a million vectors in x/y.


Solution

  • You can use numpy.

    a = np.array(x) # your x
    b = np.array(y) # your y
    mses = ((a-b)**2).mean(axis=1)
    

    If you want to use your x and y.

    a = np.random.normal(size=(1000000,100))
    b = np.random.normal(size=(1000000,100))
    mses = ((a-b)**2).mean(axis=1)
    

    With your specified matrix size (1 000 000 x 100) this takes less than a second on my machine.