Search code examples
pythonpython-2.7pandascsvio

How to delete a column from a data frame with pandas?


I read my data

import pandas as pd
df = pd.read_csv('/path/file.tsv', header=0, delimiter='\t')
print df

and get:

          id    text
0    361.273    text1...
1    374.350    text2...
2    374.350    text3...

How can I delete the id column from the above data frame?. I tried the following:

import pandas as pd
df = pd.read_csv('/path/file.tsv', header=0, delimiter='\t')
print df.drop('id', 1)

But it raises this exception:

ValueError: labels ['id'] not contained in axis

Solution

  • To actually delete the column

    del df['id'] or df.drop('id', 1) should have worked if the passed column matches exactly

    However, if you don't need to delete the column then you can just select the column of interest like so:

    In [54]:
    
    df['text']
    Out[54]:
    0    text1
    1    text2
    2    textn
    Name: text, dtype: object
    

    If you never wanted it in the first place then you pass a list of cols to read_csv as a param usecols:

    In [53]:
    import io
    temp="""id    text
    363.327    text1
    366.356    text2
    37782    textn"""
    df = pd.read_csv(io.StringIO(temp), delimiter='\s+', usecols=['text'])
    df
    Out[53]:
        text
    0  text1
    1  text2
    2  textn
    

    Regarding your error it's because 'id' is not in your columns or that it's spelt differently or has whitespace. To check this look at the output from print(df.columns.tolist()) this will output a list of the columns and will show if you have any leading/trailing whitespace.