I have a for loop that in the first iteration generates a dataframe like:
pd.DataFrame(columns = ["Al", "Si", "K", "Th"], data = [[1,2,3,4]])
The second iteration produces a dataframe that looks like:
pd.DataFrame(columns = ["W", "Cu"], data = [[5,6]])
Both the columns and data variables are generated through the loop in each iteration. I want to be able to add something at the end of the loop that performs and outer join of each one of the dataframes, such that the final result is:
pd.DataFrame(columns = ["Al", "Si", "K", "Th", "W", "Cu"], data = [[1,2,3,4, 0,0], [0,0,0,0, 5,6]])
I've tried with append, concat and outer join but can't crack it, because I need a live update on the final dataframe on each iteration, and can't sort it out.
Also, worth to mention that I can't predefine the total amount columns a priori, the elements calculated are dependent on the data and created during the loop.
edit: Here's the loop:
formulas = ("NaAlSiO2", "WCu2")
for form in formulas:
s = re.findall('([A-Z][a-z]?)([0-9]*)', form)
perc_weight = []
atoms = []
for elem, count in s:
total_weight = molecular_w_calc(form)
atoms.append(elem)
perc_weight.append((Element_mass[elem]*100*int(count)) / total_weight)
perc_df = pd.DataFrame(columns = np.array(atoms), data = [perc_weight])
Element_mass
is a dictionary with values for each atom.
perc_df
is the dataframe produced in each iteration.
molecular_w_calc
returns a single value.
Thanks!
If you want to extend the frame iteratively then concat
should actually work. This
df1 = pd.DataFrame(columns = ["Al", "Si", "K", "Th"], data = [[1,2,3,4]])
df2 = pd.DataFrame(columns = ["W", "Cu"], data = [[5,6]])
df = pd.concat([df1, df2], axis='rows')
df.fillna(0, inplace=True)
gives you
Al Si K Th W Cu
0 1.0 2.0 3.0 4.0 0.0 0.0
0 0.0 0.0 0.0 0.0 5.0 6.0
Just a suggestion: Wouldn't you be better off if you do the creation of the underlying data with basic Python only?
Something like
import re
import pandas as pd
re_comps = re.compile(r'([A-Z][a-z]?)([0-9]*)')
formulas = ("NaAlSiO2", "WCu2")
elements = {element for formula in formulas
for element, _ in re_comps.findall(formula)}
perc_dict = {key: len(formulas) * [None] for key in elements.union({'Formula'})}
for i, formula in enumerate(formulas):
perc_dict['Formula'][i] = formula
total_weight = molecular_w_calc(formula)
for element, count in re_comps.findall(formula):
count = 1 if count == '' else int(count)
perc_dict[element][i] = (Element_mass[element] * 100 * count) / total_weight
and only then Pandas
perc_df = pd.DataFrame(perc_dict)
perc_df.set_index('Formula', drop=True, inplace=True)
perc_df.sort_index(axis='columns', inplace=True)
The structure of the resulting perc_df
looks like (the values are obviously wrong, since I didn't have the Element_mass
dictionary and molecular_w_calc
function):
Al Cu Na O Si W
Formula
NaAlSiO2 1.0 NaN 1.0 2.0 1.0 NaN
WCu2 NaN 2.0 NaN NaN NaN 1.0