Search code examples
pandasseries

Understanding data type of pandas series


I have a Pandas series sliced from a DF. The series has about 100000 rows where some of the values are Float type and the others are infinity. But python specifies the whole series as an 'Object' type. So when I try to remove non-numeric value, the whole series becomes NaN.

Below is an example of the how the table is structured and all types of non-numeric enteries. °° This symbol specifies inifinity.

Time (µs) ChannelA (mV) ChannelB (mV) ChannelC (mV) ChannelD (mV)
1 0.1 0.2 0.3 0.4
2 0.5 0.7 0.4 0.5
3 0.6 0.2 0.3 0.11
4 0.8 0.6 0.7 0.6
5 °° °° °° °°

Why does python specify the whole series as an Object ?


Solution

  • You can convert each columns to numeric using pd.to_numeric, and passing errors as coerce which will convert non-numeric string values to NaN.

    for c in df:
        df[c] = pd.to_numeric(df[c], errors='coerce')
    

    OUTPUT:

    df
       Time (µs)  ChannelA (mV)  ChannelB (mV)  ChannelC (mV)  ChannelD (mV)
    0          1            0.1            0.2            0.3           0.40
    1          2            0.5            0.7            0.4           0.50
    2          3            0.6            0.2            0.3           0.11
    3          4            0.8            0.6            0.7           0.60
    4          5            NaN            NaN            NaN            NaN
    

    Data types after conversion:

    >>> df.dtypes:
    Time (µs)          int64
    ChannelA (mV)    float64
    ChannelB (mV)    float64
    ChannelC (mV)    float64
    ChannelD (mV)    float64
    dtype: object
    

    You can also replace those °° characters by inf which represents infinite value if you don't want to use NaN, using replace, then you can convert the dataframe to float.

    df.replace('°°', float('inf')).astype(float)
    
       Time (µs)  ChannelA (mV)  ChannelB (mV)  ChannelC (mV)  ChannelD (mV)
    0        1.0            0.1            0.2            0.3           0.40
    1        2.0            0.5            0.7            0.4           0.50
    2        3.0            0.6            0.2            0.3           0.11
    3        4.0            0.8            0.6            0.7           0.60
    4        5.0            inf            inf            inf            inf