Search code examples
pythonpandastime-seriestuplespandas-datareader

Python 3.6.5 returns '<' not supported between instances of 'tuple' and 'str' error message


I'm trying to split a data set into a training and testing part. I am struggling at a structural problem as it seems as the hierarchy of the data seems to be wrong to proceed with below code.

I tried the following:

import pandas as pd
data = pd.DataFrame(web.DataReader('SPY', data_source='morningstar')['Close'])
cutoff = '2015-1-1'
data = data[data.index < cutoff].dropna().copy()

Solution

  • As data.head() will reveal, data is not actually a pd.DataFrame but a pd.Series whose index is a pd.MultiIndex (as suggested also by the error which hints that each element is a tuple) rather than a pd.DatetimeIndex.

    What you could do would be to simply let

    df = data.unstack(0)
    

    With that, df[df.index < cutoff] performs the filtering you are trying to do.