I have a dataset that has a column ('facilities') with object data type and several missing values and has a string value without spaces. as shown below:
How to add space to them? I have tried some codes as below but it doesn't work:
X['Restaurant'] = X['facilities'].apply(lambda x: 1 if 'Restaurant' in x else 0)
X['BAR'] = X['facilities'].apply(lambda x: 1 if 'BAR' in x else 0)
X['SwimmingPools'] = X['facilities'].apply(lambda x: 1 if 'SwimmingPools' in x else 0)
df3 = X['facilities'].str.split(n=1, expand=True)
df3.columns = ['STATUS_ID{}'.format(x+1) for x in df3.columns]
You can use re.split
to split the words into a list, then .join
the list using whitespaces as separators:
import pandas as pd
import re
df = pd.DataFrame({"facilities":["GymrestaurantbarInternetSwimmingPools",
"Poolrestaurantgyminternetbar",
"BARswimmingPoolsInternetgym"]})
# facilities
# 0 GymrestaurantbarInternetSwimmingPools
# 1 Poolrestaurantgyminternetbar
# 2 BARswimmingPoolsInternetgym
pattern = '(gym|restaurant|internet|swimmingpools|bar)' #Add all the words you want to separate by here
df["facilities_cleaned"] = df.apply(lambda x: " ".join([word for word in re.split(pattern=pattern, string=x["facilities"].lower()) if len(word)>0]), axis=1)
# facilities facilities_cleaned
# 0 GymrestaurantbarInternetSwimmingPools gym restaurant bar internet swimmingpools
# 1 Poolrestaurantgyminternetbar pool restaurant gym internet bar
# 2 BARswimmingPoolsInternetgym bar swimmingpools internet gym