I am trying to follow a tutorial whereby an ARIMA time series analysis using differenced data is being done:
The following is the python code:
def difference(dataset):
diff = list()
for i in range(1, len(dataset)):
value = dataset[i] - dataset[i - 1]
diff.append(value)
return Series(diff)
series = pd.read_csv('dataset.csv')
X = series.values # The error in building the list can be seen here
X = X.astype('float32')
stationary = difference(X)
stationary.index = series.index[1:]
...
stationary.plot()
pyplot.show()
When the process reaches the plotting stage I get the error:
TypeError: no numeric data to plot
Tracing back, I find that the data that is being parsed is resulting in a collection of array. Saving the collection stationary as *.csv
file gives me a list like:
[11.]
[0.]
[16.]
[45.]
[27.]
[-141.]
[46.]
Can somebody tell me what is going wrong here?
PS. I have exluded the parts of import of libraries
Edit 1
A section of the dataset is reproduced below:
Year,Obs
1994,21
1995,62
1996,56
1997,29
1998,38
1999,201
To difference, just use Series.diff
or DataFrame.diff()
.
Also, Year
should be the index:
from matplotlib.ticker import MaxNLocator
import numpy as np
import pandas as pd
df = pd.DataFrame(
{'Year': np.arange(1994, 2000),
'Obs': [21, 62, 56, 29, 38, 201]})
stationary = df.set_index('Year').diff()
ax = stationary.plot(legend=False)
ax.xaxis.set_major_locator(MaxNLocator(integer=True))
Output: