Search code examples
pythonpandas

Convert pandas heading string containing chemical name to chemical symbol


I have a macro that converts a cell string that contains a chemical name to just its chemical symbol. e.g. a cell that contains "Silver ICM-MS ug/kg" would become simply "Ag"(the chemical symbol for silver). I am trying to convert this to Python to use in processing of lab data where I would convert wordy column headings in a dataframe to just chemical symbols. Headings that don't contain an element name can be left as they are. My attempt isn't working:

#load dictionary of element names, import pandas and load dummy data
dict_of_elements={"Sulphate":"SO4","Silver":"Ag","Aluminium":"Al","Americium":"Am","Argon":"Ar","Arsenic":"As","Astatine":"At","Gold":"Au","Boron":"B","Barium":"Ba","Beryllium":"Be","Bohrium":"Bh","Bismuth":"Bi","Berkelium":"Bk","Bromine":"Br","Calcium":"Ca","Cadmium":"Cd","Cerium":"Ce","Californium":"Cf","Chlorine":"Cl","Curium":"Cm","Copernicium":"Cn","Cobalt":"Co","Chromium":"Cr","Caesium":"Cs","Copper":"Cu","Dubnium":"Db","Darmstadtium":"Ds","Dysprosium":"Dy","Erbium":"Er","Einsteinium":"Es","Europium":"Eu","Fluorine":"F","Iron":"Fe","Flerovium":"Fl","Fermium":"Fm","Francium":"Fr","Gallium":"Ga","Gadolinium":"Gd","Germanium":"Ge","Hydrogen":"H","Helium":"He","Hafnium":"Hf","Mercury":"Hg","Holmium":"Ho","Hassium":"Hs","Iodine":"I","Indium":"In","Iridium":"Ir","Potassium":"K","Krypton":"Kr","Lanthanum":"La","Lithium":"Li","Lawrencium":"Lr","Lutetium":"Lu","Livermorium":"Lv","Mendelevium":"Md","Magnesium":"Mg","Manganese":"Mn","Molybdenum":"Mo","Meitnerium":"Mt","Nitrogen":"N","Sodium":"Na","Niobium":"Nb","Neodymium":"Nd","Neon":"Ne","Nickel":"Ni","Nobelium":"No","Neptunium":"Np","Oxygen":"O","Osmium":"Os","Phosphorus":"P","Protactinium":"Pa","Lead":"Pb","Palladium":"Pd","Promethium":"Pm","Polonium":"Po","Praseodymium":"Pr","Platinum":"Pt","Plutonium":"Pu","Radium":"Ra","Rubidium":"Rb","Rhenium":"Re","Rutherfordium":"Rf","Roentgenium":"Rg","Rhodium":"Rh","Radon":"Rn","Ruthenium":"Ru","Sulfur":"S","Antimony":"Sb","Scandium":"Sc","Selenium":"Se","Seaborgium":"Sg","Silicon":"Si","Samarium":"Sm","Tin":"Sn","Strontium":"Sr","Tantalum":"Ta","Terbium":"Tb","Technetium":"Tc","Tellurium":"Te","Thorium":"Th","Titanium":"Ti","Thallium":"Tl","Thulium":"Tm","Uranium":"U","Ununoctium":"Uuo","Ununpentium":"Uup","Ununseptium":"Uus","Ununtrium":"Uut","Vanadium":"V","Tungsten":"W","Xenon":"Xe","Yttrium":"Y","Ytterbium":"Yb","Zinc":"Zn"}

import pandas as pd

data = {'Silver ICP-MS': [1,2,3], 'Aluminium(III)': [4,5,6], 'Sulphate-LECO': [7,8,9], 'NOT a match': [10,11,12]}
df = pd.DataFrame(data)

df

Out[1]:
   Aluminium(III)  NOT a match  Silver ICP-MS  Sulphate-LECO
0               4           10              1              7
1               5           11              2              8
2               6           12              3              9

I have tried variations along the lines of:

element_cols=[key for key in dict_of_elemets.keys()in df.columns]

print(element_cols)

but I'm still getting to grips with list comprehensions and it's not working.

Or I can loop through and get the matches in a list and then feed that into df.columns:

cols=[]
for i in df.columns:
    for key, value in dict_of_elements.items():
         i= i.replace(key, value)
    cols.append(i)
df.columns=cols    
df

which gives:

Out[3]:
   Al(III)  NOT a match  Ag ICP-MS  SO4-LECO
0        4           10          1         7
1        5           11          2         8
2        6           12          3         9

But the entire header should be replaced with the symbol "Ag, SO4...".


Solution

  • This seems to work, plus I understand it.

    Dummy data:

    import pandas as pd
    
    data = {'Silver ICP-MS': [1,2,3], 'Aluminium(III)': [4,5,6], 'Sulphate-LECO': [7,8,9], 'NOT a match': [10,11,12]}
    df = pd.DataFrame(data)
    print ('starting data')
    display(df)
    

    Code:

    dict_of_elements={"Silver":"Ag","Aluminium":"Al","Americium":"Am","Argon":"Ar","Arsenic":"As","Astatine":"At","Gold":"Au","Boron":"B","Barium":"Ba","Beryllium":"Be","Bohrium":"Bh","Bismuth":"Bi","Berkelium":"Bk","Bromine":"Br","Calcium":"Ca","Cadmium":"Cd","Cerium":"Ce","Californium":"Cf","Chlorine":"Cl","Curium":"Cm","Copernicium":"Cn","Cobalt":"Co","Chromium":"Cr","Caesium":"Cs","Copper":"Cu","Dubnium":"Db","Darmstadtium":"Ds","Dysprosium":"Dy","Erbium":"Er","Einsteinium":"Es","Europium":"Eu","Fluorine":"F","Iron":"Fe","Flerovium":"Fl","Fermium":"Fm","Francium":"Fr","Gallium":"Ga","Gadolinium":"Gd","Germanium":"Ge","Hydrogen":"H","Helium":"He","Hafnium":"Hf","Mercury":"Hg","Holmium":"Ho","Hassium":"Hs","Iodine":"I","Indium":"In","Iridium":"Ir","Potassium":"K","Krypton":"Kr","Lanthanum":"La","Lithium":"Li","Lawrencium":"Lr","Lutetium":"Lu","Livermorium":"Lv","Mendelevium":"Md","Magnesium":"Mg","Manganese":"Mn","Molybdenum":"Mo","Meitnerium":"Mt","Nitrogen":"N","Sodium":"Na","Niobium":"Nb","Neodymium":"Nd","Neon":"Ne","Nickel":"Ni","Nobelium":"No","Neptunium":"Np","Oxygen":"O","Osmium":"Os","Phosphorus":"P","Protactinium":"Pa","Lead":"Pb","Palladium":"Pd","Promethium":"Pm","Polonium":"Po","Praseodymium":"Pr","Platinum":"Pt","Plutonium":"Pu","Radium":"Ra","Rubidium":"Rb","Rhenium":"Re","Rutherfordium":"Rf","Roentgenium":"Rg","Rhodium":"Rh","Radon":"Rn","Ruthenium":"Ru","Sulfur":"S","Antimony":"Sb","Scandium":"Sc","Selenium":"Se","Seaborgium":"Sg","Silicon":"Si","Samarium":"Sm","Tin":"Sn","Strontium":"Sr","Tantalum":"Ta","Terbium":"Tb","Technetium":"Tc","Tellurium":"Te","Thorium":"Th","Titanium":"Ti","Thallium":"Tl","Thulium":"Tm","Uranium":"U","Ununoctium":"Uuo","Ununpentium":"Uup","Ununseptium":"Uus","Ununtrium":"Uut","Vanadium":"V","Tungsten":"W","Xenon":"Xe","Yttrium":"Y","Ytterbium":"Yb","Zinc":"Zn", 'Sulphate':"SO4"}
    
    
    cols=[]
    for i in df.columns:
        for key, value in dict_of_elements.items():
            if key in i:
                i= dict_of_elements[key]
           
        cols.append(i)
    
    df.columns=cols  
    print ("processed data")
    df
    

    Gives:

    starting data
    
       Aluminium(III)  NOT a match  Silver ICP-MS  Sulphate-LECO
    0               4           10              1              7
    1               5           11              2              8
    2               6           12              3              9
    
    
    processed data
    Out[4]:
       Al  NOT a match  Ag  SO4
    0   4           10   1    7
    1   5           11   2    8
    2   6           12   3    9
    

    I will make the dictionary keys all uppercase and change to if key in i.upper() to capture different capitalisation.