Search code examples
pythonpandascsvfile-iopercentage

Convert percent string to float in pandas read_csv


Is there a way to convert values like '34%' directly to int or float when using read_csv() command in pandas? I want '34%' to be directly read as 0.34

  1. Using this in read_csv did not work:

    read_csv(..., dtype={'col':np.float})

  2. After loading the csv as 'df' this also did not work with the error "invalid literal for float(): 34%"

    df['col'] = df['col'].astype(float)

  3. I ended up using this which works but is long winded:

    df['col'] = df['col'].apply(lambda x: np.nan if x in ['-'] else x[:-1]).astype(float)/100


Solution

  • You can define a custom function to convert your percents to floats at read_csv() time:

    # dummy data
    temp1 = """index col 
    113 34%
    122 50%
    123 32%
    301 12%"""
    
    # Custom function taken from https://stackoverflow.com/questions/12432663/what-is-a-clean-way-to-convert-a-string-percent-to-a-float
    def p2f(x):
        return float(x.strip('%'))/100
    
    # Pass to `converters` param as a dict...
    df = pd.read_csv(io.StringIO(temp1), sep='\s+',index_col=[0], converters={'col':p2f})
    df
    
            col
    index      
    113    0.34
    122    0.50
    123    0.32
    301    0.12
    
    # Check that dtypes really are floats
    df.dtypes
    
    col    float64
    dtype: object
    

    My percent to float code is courtesy of ashwini's answer: What is a clean way to convert a string percent to a float?