Search code examples
pythonpandasstatisticstime-seriesstatsmodels

Eliminating string from dataframe to run time series seasonal decomposistion


Dataframe for time series analysis

I'm trying to run a seasonal_decompose on the category column. I get the error:

  • ValueError: could not convert string to float: 'chocolate: (United States)'

Code:

# Multiplicative Decomposition
decomposeM = seasonal_decompose(df1["Category: All categories"],model='multiplicative', extrapolate_trend='freq')
plt.rcParams['figure.figsize'] = (12, 8);
#decomposeM.plot();
decomposeM.plot().suptitle('Multiplicative Decomposition', fontsize=16)

Solution

  • It's pretty straightforward in pandas to convert a column into numerical values. Documents for changing the behavior are available here.

    import pandas as pd
    df = pd.DataFrame({'col1':[2, 1.2, 'foo', 'bar']})
    pd.to_numeric(df.col1, errors='coerce')
    

    output:

    0    2.0
    1    1.2
    2    NaN
    3    NaN
    Name: col1, dtype: float64