Search code examples
pythonpandasdataframenumeric

Pandas dataframe and to_numeric: select column by index


The question is probaly extremely dumb, but i hurt my brain figuring out what to do

There is a pd.dataframe with N columns. I need to select some columns, referring by index of a column, then convert all values to numeric and rewrite that column in my dataframe

I've done it by column name reference (like df['a'] = pd.to_numeric(df['a']) but stuck with indices (like df[1] = pd.to_numeric(df[1])

What is the right way in this situation to dataframe column referencing? (python 2.7)


Solution

  • You can use iloc for selecting columns and then apply to_numeric:

    import pandas as pd
    
    df = pd.DataFrame({1:['1','2','3'],
                       2:[4,5,6],
                       3:[7,8,9],
                       4:['1','3','5'],
                       5:[5,3,6],
                       6:['7','4','3']})
    
    print (df)
       1  2  3  4  5  6
    0  1  4  7  1  5  7
    1  2  5  8  3  3  4
    2  3  6  9  5  6  3
    
    print (df.dtypes)
    1    object
    2     int64
    3     int64
    4    object
    5     int64
    6    object
    dtype: object
    
    print (df.columns)
    Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')
    
    cols = [1,4,6]    
    df.iloc[:, cols] = df.iloc[:, cols].apply(pd.to_numeric)
    
    print (df)
       1  2  3  4  5  6
    0  1  4  7  1  5  7
    1  2  5  8  3  3  4
    2  3  6  9  5  6  3
    
    print (df.dtypes)
    1    int64
    2    int64
    3    int64
    4    int64
    5    int64
    6    int64
    dtype: object
    

    If columns are strings, not int (but it looks like int) add '' to numbers in list cols:

    import pandas as pd
    
    df = pd.DataFrame({'1':['1','2','3'],
                       '2':[4,5,6],
                       '3':[7,8,9],
                       '4':['1','3','5'],
                       '5':[5,3,6],
                       '6':['7','4','3']})
    
    #print (df)
    
    #print (df.dtypes)
    
    print (df.columns)
    Index(['1', '2', '3', '4', '5', '6'], dtype='object')
    
    #add `''`
    cols = ['1','4','6']
    
    #1. loc: only label based access
    # df.loc[:, cols] = df.loc[:, cols].apply(pd.to_numeric)
    
    #2. iloc: for index based access
    # cols = [1,4,6]
    # df.iloc[:, cols].apply(pd.to_numeric)
    

    print (df)
       1  2  3  4  5  6
    0  1  4  7  1  5  7
    1  2  5  8  3  3  4
    2  3  6  9  5  6  3
    
    print (df.dtypes)
    1    int64
    2    int64
    3    int64
    4    int64
    5    int64
    6    int64
    dtype: object