Search code examples
pythonpandasdata-sciencedata-processing

The Data Contains Faulty Readings with certain format


I have a pandas df, where one of my columns have faulty values. I want to clean these values

The Faulty Values are negative and end with <, example '-2.44<'. How do I fix this without affecting other columns? My index is Date-Time

I have tried to convert the column to numeric data.

df.values = pd.to_numeric(df.values, errors='coerce')

There are no error messages. But, I'd like to replace them with removing '<'.


Solution

  • Use Series.str.rstrip for remove < from right side:

    df.values = pd.to_numeric(df.values.str.rstrip('<'), errors='coerce')
    

    Or more general is used Series.str.strip - possible add more values:

    df.values = pd.to_numeric(df.values.str.strip('<>'), errors='coerce')