I have a tough situation here where I need to perform multiple operations to get a new column. I have a list of substrings. If 'company' column does not contain any of the substrings from list, I have to combine 'compnay' with 'city' (without numbers) into a 'new' column. In contrast, 'new' will return 'company' column if there is match of substring.
substring list
list=['co','co.','ltd','ltd.','limited','inc','inc.']
Current dataframe
City Company
10001 New York Nike ltd
Paris Louis Vuitton
689 Tokyo Nissan inc.
Milan Ferrari co
999 Hong Kong Li&Fung
Vancouver Lululemon
Seoul Samsung limited
Desired output
City Company New
10001 New York Nike Ltd Nike Ltd
2345 Paris Louis Vuitton Louis Vuitton, Paris
689 Tokyo Nissan inc. Nissan inc.
Milan Ferrari co Ferrari co
999 Hong Kong Li&Fung Li&Fung, Hong Kong
Vancouver Lululemon Lululemon, Vancouver
Seoul Samsung limited Samsung Limited
Thanks in advance! Any help is appreciated!!
Using pandas.Series.where
, str.contains
and extract
:
l=['co','co.','ltd','ltd.','limited','inc','inc.']
s = df["Company"]
city = df["City"].str.extract("(\D+)",expand=False).str.strip()
df["new"] = s.where(s.str.contains("|".join(l)),
lambda x:x+", "+city)
print(df)
print(df)
Output:
City Company new
0 10001 New York Nike ltd Nike ltd
1 Paris Louis Vuitton Louis Vuitton, Paris
2 689 Tokyo Nissan inc. Nissan inc.
3 Milan Ferrari co Ferrari co
4 999 Hong Kong Li&Fung Li&Fung, Hong Kong
5 Vancouver Lululemon Lululemon, Vancouver
6 Seoul Samsung limited Samsung limited