Search code examples
pythoncsvpandasmultiple-columnsseparator

Select Columns in pandas DF


Below is my data and I am trying to access a column. It was working fine until yesterday, but now I'm not sure if I am doing something wrong:

    DISTRICT;CPE;EQUIPMENT,NR_EQUIPM
0   47;CASTELO BRANCO;17520091VM;101                                                                                                                                                                                                     
1   48;CASTELO BRANCO;17520103VV;160                                                                                                                                                                                                     
2   49;CASTELO BRANCO;17520103VV;160

When I try this, it gives me an error:

df = pd.read_csv(archiv, sep=",")   
df['EQUIPMENT']  

ERROR:

KeyError: 'EQUIPMENT'

Also I am trying this, but doesn`t work either:

df.EQUIPMENT

ERROR:

AttributeError: 'DataFrame' object has no attribute 'EQUIPMENT'

BTW, I am using:

Python 2.7.12 |Anaconda 4.1.1 (32-bit)| (default, Jun 29 2016, 11:42:13) [MSC v.1500 32 bit (Intel)]

Any idea?


Solution

  • You need change sep to ;, because separator is changed in csv:

    df = pd.read_csv(archiv, sep=";") 
    

    If check last separator of columns, there is ,, so you can use two separators - ;,, but is necessary add parameter engine='python' because warning:

    ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'. for index, row in df.iterrows():

    Sample:

    import pandas as pd
    import io
    
    temp=u"""DISTRICT;CPE;EQUIPMENT,NR_EQUIPM
    47;CASTELO BRANCO;17520091VM;101
    48;CASTELO BRANCO;17520103VV;160
    49;CASTELO BRANCO;17520103VV;160"""
    #after testing replace io.StringIO(temp) to filename
    df = pd.read_csv(io.StringIO(temp), sep="[;,]", engine='python')
    
    print (df)
       DISTRICT             CPE   EQUIPMENT  NR_EQUIPM
    0        47  CASTELO BRANCO  17520091VM        101
    1        48  CASTELO BRANCO  17520103VV        160
    2        49  CASTELO BRANCO  17520103VV        160