python pandas dataframe data-cleaning feature-extraction

Extract the numbers in a string from a column in pandas dataframe

I need to do feature extraction using the column 'Amenities' from the dataframe house_price.

The column Amenities has the following set of data

house_data['Amenities']

3                       3 beds 1 bath
4              1 bed 1 bath 1 parking
5                       3 beds 1 bath
6            2 beds 2 baths 2 parking
7             3 beds 1 bath 2 parking
                    ...              
2096    3 beds 2 baths 1 parking 419m
2097          4 beds 1 bath 2 parking
2098         3 beds 2 baths 2 parking
2099         2 beds 2 baths 1 parking
2100    3 beds 2 baths 1 parking 590m
Name: Amenities, Length: 1213, dtype: object

I need to extract the number of beds, baths and parkings and store them into 3 seperate columns.

house_data["bedrooms"] = ''
house_data["bedrooms"] = house_data["Amenities"].str.extract("(\d*\.?\d+)", expand=True)



3       3
4       1
5       3
6       2
7       3
       ..
2096    3
2097    4
2098    3
2099    2
2100    3
Name: bedrooms, Length: 1213, dtype: object

The above code extracts only the first digit of the entire string. How can I extract the digits representing the number of baths/parking and store them under different columns?

Solution

We can use named groups here with Series.str.extract:

regex = r'(?P<beds>\d)\sbeds?\s(?P<bath>\d+)\sbaths?\s?(?P<parking>\d)?'
df = pd.concat([df, df['Amenities'].str.extract(regex)], axis=1)

                       Amenities beds bath parking
0                  3 beds 1 bath    3    1     NaN
1         1 bed 1 bath 1 parking    1    1       1
2                  3 beds 1 bath    3    1     NaN
3       2 beds 2 baths 2 parking    2    2       2
4        3 beds 1 bath 2 parking    3    1       2
5  3 beds 2 baths 1 parking 419m    3    2       1
6        4 beds 1 bath 2 parking    4    1       2
7       3 beds 2 baths 2 parking    3    2       2
8       2 beds 2 baths 1 parking    2    2       1
9  3 beds 2 baths 1 parking 590m    3    2       1