Search code examples
python-3.xpandasexponential

Turn numeric text string with powers of ten nomenclator (e+) into float in python pandas


I've got a dataframe with more than 30000 rows and almost 40 columns exported from a csv file.

The most part of it mixes str with int features.

-integers are int

-floats and powers of ten are str

It looks like this:

Id       A                 B
1        2.5220019e+008    1742087
2        1.7766118e+008    2223964.5
3        3.3750285e+008    2705867.8
4        97782360          2.5220019e+008

I've tried the following code:

import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point, LineString, shape

df = pd.read_csv('mycsvfile.csv').astype(float)

Which yields the this error message:

ValueError: could not convert string to float: '-1.#IND'

I guess that it has to do about the exponencial nomenclator of powers of ten (e+) that the python libraries isn't able to transform.

Is there a way to fix it?


Solution

  • From my conversation with QuangHoang I should apply the function:

    pd.to_numeric(df['column'], errors='coerce')
    

    Since almost the whole DataFrame are str objects, I ran the following code line:

    df2 = df.apply(lambda x : pd.to_numeric(x, errors='coerce'))