Search code examples
pandasdataframedata-cleaning

Data cleaning in Pandas


I have an age column which has values such as 10+ <9 or >45. I have to clean this data and make it ready for EDA. What sort of logic I can use to clean the data. enter image description here


Solution

  • Hope, it will work for your solution, use str.extract to get only integers from a string,

    import pandas as pd
    import re
    df = pd.DataFrame(
        data=
        [
            {'emp_length': '10+years'},
            {'emp_length': '3 years'},
            {'emp_length': '<1 year'}
        ]
                     )
    df['emp_length'] = df['emp_length'].str.extract(r'(\d+)')
    df