I'm new to Python and Pandas, so I would be really glad if someone could help me in this matter. My question is the following:
If I have a .txt file with a set of reactions as strings (R1, R2...). Each reaction has compounds (A,B,C,D...) with their respective stoichiometric coefficients (1, 2, 3...) such as:
R1: A + 2B + C <=> D
R2: A + B <=> C
How can I create a data frame in python in the format of a stoichiometric matrix (compounds as rows X reactions as columns) like this:
R1 R2
A -1 -1
B -2 -1
C -1 1
D 1 0
Observation: Compounds on the left side of the equation should have negative stoichiometric values while the ones on the right should be positive
Thanks =D
Try this:
import pandas as pd
import re # regular expressions
def coeff_comp(s):
# Separate stoichiometric coefficient and compound
result = re.search('(?P<coeff>\d*)(?P<comp>.*)', s)
coeff = result.group('coeff')
comp = result.group('comp')
if not coeff:
coeff = '1' # coefficient=1 if it is missing
return comp, int(coeff)
equations = ['R1: A + 2B + C <=> D', 'R2: A + B <=> C'] # some test data
reactions_dict = {} # results dictionary
for equation in equations:
compounds = {} # dict -> compound: coeff
eq = equation.replace(' ', '')
r_id, reaction = eq.split(':') # separate id from chem reaction
lhs, rhs = reaction.split('<=>') # split left and right hand side
reagents = lhs.split('+') # get list of reagents
products = rhs.split('+') # get list of products
for reagent in reagents:
comp, coeff = coeff_comp(reagent)
compounds[comp] = - coeff # negative on lhs
for product in products:
comp, coeff = coeff_comp(product)
compounds[comp] = coeff # positive on rhs
reactions_dict[r_id] = compounds
# insert dict into DataFrame, replace NaN with 0, let values be int
df = pd.DataFrame(reactions_dict).fillna(value=0).astype(int)
The output looks like
R1 R2
A -1 -1
B -2 -1
C -1 1
D 1 0