Search code examples
pythonpandasvalueerror

python pandas : Why can't I use both index_col and usecols in the same read_csv statement ? raised valueError


if I read in the csv file using this code :

    df = pd.read_csv('amazon2.csv'
                 , names=["year","state","month","number","date"]
                 , index_col = ['month']
                 , usecols=["year","state","number"]
                 , encoding = "ISO-8859-1")

would raise valueError:

raise ValueError("Index {col} invalid".format(col=col))

ValueError: Index month invalid

But would not raise error if either usecols or index_col is commented out Thanks in advance! the database looks like this : amazon2.csv


Solution

  • The error source is caused by that the index column name "month" is not included in the columns list :usecols.

    df1=pd.read_csv("test.csv",index_col="month",usecols=["year","state","number","date","month"])
    

    Output:

              year  state  number      date
    month                       
    Janeiro   1998   Acre       0  1998/1/1
    Janeiro1  1998   Acre       1  1998/1/1
    Janeiro1  1999  Acre2       2  1999/1/1
    Janeiro2  2000   Acre       3  2000/1/1
    Janeiro2  2000  Acre1       4  2000/1/1
    

    But I agree that there should be no duplicated values in the index col.