Search code examples
python-3.xnumpykalman-filtermean-square-errorpykalman

How can we measure RMSE in Python?


I am doing an experiment using Kalman Filters. I have created a very small time series data ready with three columns formatted as follows. The full dataset is attached here for reproduciability since I can't attach a file on stackoverflow:

csv file

  time        X      Y
 0.040662  1.041667  1
 0.139757  1.760417  2
 0.144357  1.190104  1
 0.145341  1.047526  1
 0.145401  1.011882  1
 0.148465  1.002970  1
 ....      .....     .

I have read the documetation of the Kalman Filter and managed to do a simple linear prediction and here is my code

import matplotlib.pyplot as plt 
from pykalman import KalmanFilter 
import numpy as np
import pandas as pd



df = pd.read_csv('testdata.csv')
print(df)
pd.set_option('use_inf_as_null', True)

df.dropna(inplace=True)


X = df.drop('Y', axis=1)
y = df['Y']



estimated_value= np.array(X)
real_value = np.array(y)

measurements = np.asarray(estimated_value)



kf = KalmanFilter(n_dim_obs=1, n_dim_state=1, 
                  transition_matrices=[1],
                  observation_matrices=[1],
                  initial_state_mean=measurements[0,1], 
                  initial_state_covariance=1,
                  observation_covariance=5,
                  transition_covariance=1)

state_means, state_covariances = kf.filter(measurements[:,1]) 
state_std = np.sqrt(state_covariances[:,0])
print (state_std)
print (state_means)
print (state_covariances)


fig, ax = plt.subplots()
ax.margins(x=0, y=0.05)

plt.plot(measurements[:,0], measurements[:,1], '-r', label='Real Value Input') 
plt.plot(measurements[:,0], state_means, '-b', label='Kalman-Filter') 
plt.legend(loc='best')
ax.set_xlabel("Time")
ax.set_ylabel("Value")
plt.show()

Which gives the following plot as an output

enter image description here

As we can see in the plot, the pattern seems to be captured reasonably well. How can we statistically measure the root-mean-square error (RMSE) (the error distance between the red and blue lines in the plot above)? Any help would be appreciated.


Solution

  • Try this!

    from sklearn.metrics import mean_squared_error
    
    mean_squared_error( measurements[:,1], state_means)