Search code examples
pythonpandasstringdata-cleaning

Cleaning string column that contains float number


I am trying to remove the point and zero from every float value within this dataset

  index     CIP
    1        DF5TY34
    2        12342.0
    3        de44dW

(CIP is casted as String)

I wrote this line to resolve the problem but its not doing anything and I'm recieving only a warning no errors:

 pro1[pro1['CIP'].str.contains('\..')]["CIP"] = pro1.loc[pro1['CIP'].str.contains('\..')]["CIP"].astype(float).astype(int).astype(str)

this is the warning:

/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas- 
docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
"""Entry point for launching an IPython kernel.

Solution

  • For a strict replacement of .0, you can use removesuffix:

    df['CIP'] = df['CIP'].str.removesuffix('.0')
    

    For a more flexible approach, use a regex with str.replace:

    df['CIP'] = df['CIP'].str.replace('\.0*$', '', regex=True)
    

    output:

       index      CIP
    0      1  DF5TY34
    1      2    12342
    2      3   de44dW
    

    regex:

    \.   # match a dot
    0*   # match any number of 0 (including none)
    $    # match end of line