Search code examples
pythonpandaschemistry

How can I use Python to create a Stoichiometric Matrix


I'm new to Python and Pandas, so I would be really glad if someone could help me in this matter. My question is the following:

If I have a .txt file with a set of reactions as strings (R1, R2...). Each reaction has compounds (A,B,C,D...) with their respective stoichiometric coefficients (1, 2, 3...) such as:

R1: A + 2B + C <=> D

R2: A + B <=> C

How can I create a data frame in python in the format of a stoichiometric matrix (compounds as rows X reactions as columns) like this:

  R1 R2
A -1 -1 
B -2 -1
C -1  1
D  1  0

Observation: Compounds on the left side of the equation should have negative stoichiometric values while the ones on the right should be positive

Thanks =D


Solution

  • Try this:

    import pandas as pd
    import re  # regular expressions
    
    def coeff_comp(s):
        # Separate stoichiometric coefficient and compound
        result = re.search('(?P<coeff>\d*)(?P<comp>.*)', s)
        coeff = result.group('coeff')
        comp = result.group('comp')
        if not coeff:
            coeff = '1'                          # coefficient=1 if it is missing
        return comp, int(coeff)
    
    equations = ['R1: A + 2B + C <=> D', 'R2: A + B <=> C']  # some test data
    reactions_dict = {}                          # results dictionary
    
    for equation in equations:
        compounds = {}                           # dict -> compound: coeff 
        eq = equation.replace(' ', '')  
        r_id, reaction = eq.split(':')           # separate id from chem reaction
        lhs, rhs = reaction.split('<=>')         # split left and right hand side
        reagents = lhs.split('+')                # get list of reagents
        products = rhs.split('+')                # get list of products
        for reagent in reagents:
            comp, coeff = coeff_comp(reagent)
            compounds[comp] = - coeff            # negative on lhs
        for product in products:
            comp, coeff = coeff_comp(product)
            compounds[comp] = coeff              # positive on rhs
        reactions_dict[r_id] = compounds         
    
    # insert dict into DataFrame, replace NaN with 0, let values be int
    df = pd.DataFrame(reactions_dict).fillna(value=0).astype(int)
    

    The output looks like

       R1  R2
    A  -1  -1
    B  -2  -1
    C  -1   1
    D   1   0