Search code examples
pandasdataframemachine-learninglinear-regressionsupervised-learning

Changes in the column values for machine learning


Below shown is the engine column that I have taken from a data set

engine
2150 cc
2240 cc
2150 cc
2230 cc
2050 cc
2280 cc

I want my column engine to be like as shown below:

engine
2150
2240
2150
2230
2050
2280

Solution

  • You can extract the leading digits with a regex:

    df['engine'] = df['engine'].str.extract('(^\d+)')
    

    output:

      engine
    0   2150
    1   2240
    2   2150
    3   2230
    4   2050
    5   2280
    

    If you need integers:

    df['engine'] = df['engine'].str.extract('(^\d+)').astype(int)