An example df
data = {'id':[1,2,3,4,5,6,7], 'location':['Havana,Cuba', 'Santiago de Cuba' , 'La Habana', 'Cuba', 'Havana, Cuba', 'Santiago, Chile', 'Chile']}
# Create DataFrame
df = pd.DataFrame(data)
print(df)
I use the following to split the string in location column.
# Splits string into two new columns
df[['location_1', 'location_2']] = df['location'].str.split('[,|.]', 1, expand=True)
I need to further process by checking the value of LOCATION_1 against a list. If it meets one of the the STRING values from the list I need to replace the value of LOCATION_2 with the match list value and replace the LOCATION_1 value with "".
The example below (based on) is a start but needs to be conditional on a match with the list as it overwrites existing data in LOCATION_2. Another helpful solution is here but does not use a List.
sample_list=['Cuba','Chile']
df['location_2'] = df['location_1'].apply(lambda x: ''.join([L for L in sample_list if L in x]))
df['location_2'] = df['location_2'].mask(df['location_2'] == '')
print(df)
Finally, I am not trying to only extract "Cuba" and "Chile" but I also need to modify a string such as "Santiago de Cuba" to result in "Santiago de" in LOCATION_1 and "Cuba" in Location_2.
Final table should look something like this.
id location location_1 location_2
0 1 Havana,Cuba Havana Cuba
1 2 Santiago de Cuba Santiago de Cuba
2 3 La Habana La Habana
3 4 Cuba Cuba
4 5 Havana, Cuba Havana Cuba
5 6 Santiago, Chile Santiago Chile
6 7 Chile Chile
This should get you the answer your looking for IIUC
data = {'id':[1,2,3,4,5,6,7], 'location':['Havana,Cuba', 'Santiago de Cuba' , 'La Habana', 'Cuba', 'Havana, Cuba', 'Santiago, Chile', 'Chile']}
# Create DataFrame
df = pd.DataFrame(data)
sample_list=['Cuba','Chile']
df[['location_1', 'location_2']] = df['location'].str.split('[,|.]', 1, expand=True)
for i in range(0, len(sample_list)):
df['location_2'] = np.where(df['location'].str.contains(sample_list[i]), sample_list[i], df['location_2'])
df['location_2'] = df['location_2'].fillna(' ')
df