Search code examples
pythonpandasdataframejupyter-notebookdata-cleaning

Can you remove measurements - g/kg/ml etc from a Pandas Dataframe?


I am doing some pre processing for a data set on one particular column 'Title' I have already removed numbers and punctuation. But also want to remove measurements as well. The measurements are not in a separate column, they're in the title column.

            #Load data set
df = pd.read_csv (r'example')
#df = pd.read_csv (r'example)


# remove numbers and punctuation 
df['Title'] = df['Title'].str.replace(r'[^\w\s]+', '')
df['Title'] = df['Title'].str.replace('\d+', '')
print (df['Title'])    

Return and the dataset column


Solution

  • df['Title'] = df['Title'].str.replace(r'\sg$|\skg$|\sml$', '')
    

    as an example. or more generally removing the last word will amount to:

    df['Title'] = df['Title'].str.replace(r'\s[a-z]+$', '')