So I have a column of strings that is listed as "compounds"
Composition (column title)
ZrMo3
Gd(CuS)3
Ba2DyInTe5
I have another column that has strings metal elements from the periodic table and i'll call that column "metals"
Elements (column title)
Li
Be
Na
The objective is to check each string from "compounds" with every single string listed in "metals" and if any string from metals is there then it would be classified as true. Any ideas how I can code this?
Example: (if "metals" has Zr, Ag, and Te)
ZrMo3 True
Gd(CuS)3 False
Ba2DyInTe5 True
I recently tried using this code below, but I ended up getting all false
asd = subset['composition'].isin(metals['Elements'])
print(asd)
also tried this code and got all false as well
subset['Boolean'] = subset.apply(lambda x: True if any(word in x.composition for word in metals) else False, axis=1)
assuming you are using pandas, you can use a list comprehension inside your lambda since you essentially need to iterate over all elements in the elements list
import pandas as pd
elements = ['Li', 'Be', 'Na', 'Te']
compounds = ['ZrMo3', 'Gd(CuS)3', 'Ba2DyInTe5']
df = pd.DataFrame(compounds, columns=['compounds'])
print(df)
output
compounds
0 ZrMo3
1 Gd(CuS)3
2 Ba2DyInTe5
df['boolean'] = df.compounds.apply(lambda x: any([True if el in x else False for el in elements]))
print(df)
output
compounds boolean
0 ZrMo3 False
1 Gd(CuS)3 False
2 Ba2DyInTe5 True
if you are not using pandas, you can apply the lambda function to the lists with the map function
out = list(
map(
lambda x: any([True if el in x else False for el in elements]), compounds)
)
print(out)
output
[False, False, True]
here would be a more complex version which also tackles the potential errors @Ezon mentioned based on the regular expression matching module re. since this approach is essentially looping not only over the elements to compare with a single compound string but also over each constituent of the compounds I made two helper functions for it to be more readable.
import re
import pandas as pd
def split_compounds(c):
# remove all non-alphabet elements
c_split = re.sub(r"[^a-zA-Z]", "", c)
# split string at capital letters
c_split = '-'.join(re.findall('[A-Z][^A-Z]*', c_split))
return c_split
def compare_compound(compound, element):
# split compound into list
compound_list = compound.split('-')
return any([element == c for c in compound_list])
# build sample data
compounds = ['SiO2', 'Ba2DyInTe5', 'ZrMo3', 'Gd(CuS)3']
elements = ['Li', 'Be', 'Na', 'Te', 'S']
df = pd.DataFrame(compounds, columns=['compounds'])
# split compounds into elements
df['compounds_elements'] = [split_compounds(x) for x in compounds]
print(df)
output
compounds compounds_elements
0 SiO2 Si-O
1 Ba2DyInTe5 Ba-Dy-In-Te
2 ZrMo3 Zr-Mo
3 Gd(CuS)3 Gd-Cu-S
# check if any item from 'elements' is in the compounds
df['boolean'] = df.compounds_elements.apply(
lambda x: any([True if compare_compound(x, el) else False for el in elements])
)
print(df)
output
compounds compounds_elements boolean
0 SiO2 Si-O False
1 Ba2DyInTe5 Ba-Dy-In-Te True
2 ZrMo3 Zr-Mo False
3 Gd(CuS)3 Gd-Cu-S True