Search code examples
python-3.xpandasnumpycsvaws-glue

CSV file Infinity value issue with AWS Glue job


I have a csv file which I am reading with Pandas and trying to convert NaN and Infinity to 0.0. I have the code which I run locally and get the conversion properly such as:

df = pd.read_csv('test.csv')
print(df['C1'])
df.replace([np.inf, -np.inf], np.nan, inplace=True)
df = df.fillna(0.00)
print(df['C1'])
0    NaN
1    inf
2    NaN
Name: C1, dtype: float64
0    0.0
1    0.0
2    0.0
Name: C1, dtype: float64

Here, the infinity and NaN value is converted properly into 0.0 as can be seen in the output. But when I do the same in AWS Glue Python Shell job, it does not convert the infinity value to 0.0. The code and output for Glue job is as below:

df = pd.read_csv('s3://bucket/test.csv')
print(df['C1'])
df = df.replace([np.Infinity, -np.Infinity], np.nan)
df = df.fillna(0.00)
print(df['C1'])
0         NaN
1    Infinity
2         NaN
Name: C1, dtype: object
0           0
1    Infinity
2           0
Name: C1, dtype: object

The same file is being used locally and on S3, but the issue is with infinity value. Also, locally, the data types are read as float64, but object type in Glue. Any help around this?


Solution

  • I was able to resolve it based on BdR response in the comments so here is the answer:

    df = pd.read_csv(input_path, na_values=["Infinity", "-Infinity"])
    df = df.replace([np.Infinity, -np.Infinity], np.nan)
    df = df.fillna(0.00)