Search code examples
pythonpandasseriesmixed-type

Pandas: Drop all string components in a mixed typed series with integers and strings


This drives me nuts. When I searched for tips about dropping elements in a dataframe there was nothing about mixed typed series.

Say here is a dataframe:

import pandas as pd
df = pd.DataFrame(data={'col1': [1,2,3,4,'apple','apple'], 'col2': [3,4,5,6,7,8]})
a = df['col1']

Then 'a' is a mixed typed series with 6 components. How can I remove all 'apple's from a? I need series = 1,2,3,4.


Solution

  • To retain the integers as integer type without changing them to float:

    Approach: filter rows with numeric values to keep (instead of converting non-numeric values to NaN then drop NaN). The difference is that we won't have intermediate result with NaN, which will force the numeric values to change from integer to float.

    a = pd.to_numeric(a[a.astype(str).str.isnumeric()])
    

    Result:

    The resulting dtype remains as integer type int64

    print(a)
    
    0    1
    1    2
    2    3
    3    4
    Name: col1, dtype: int64
    

    If you produce intermediate results with NaN like below:

    a = pd.to_numeric(a, errors='coerce').dropna()
    

    The resulting dtype is forced to change to float type (instead of remaining as integer)

    0    1.0
    1    2.0
    2    3.0
    3    4.0
    Name: col1, dtype: float64