Search code examples
pythonsympysubstitution

Simultaneously substituting multiple values with sympy and handling variable ordering dependence


My question concerns substituting multiple symbol variables in a sympy expression with a .subs() method.

Approaches I know for this all, on some level, implement a single-value .subs() method sequentially, resulting in critical dependence upon some ordering of the substitution values (which can be tricky to predict/manage) and an inability to achieve certain simultaneous substitutions (e.g., like interchanging two variables) directly. I notice an obvious work-around that removes this order-dependence (involving dummy variables), but I worry it is not optimally efficient. Is there a standard efficient way of implementing a simultaneous multi-value substitution?

In more detail, let's consider two examples related to an expression expr depending on some variables x, y, z, e.g.,

from sympy import symbols
x,y,z=symbols(['x','y','z'])
expr=x+y**2-z

First, for an example of a substitution where order does not matter, consider sending x to the int 1 and y to z, and since order does not matter we can achieve such a substitution with several approaches. Indeed the following substitution syntax variants all return the same result:

# order-independent substitution example:
EG1_subs1=expr.subs({y:z,x:1})
EG1_subs2=expr.subs([(y,z),(x,1)])
EG1_subs3=expr.subs([(x,1),(y,z)])

# to compare the results:
for j in [EG1_subs1,EG1_subs2,EG1_subs3]:
    print(j)

In these examples it appears that .subs() carries out the substitutions simultaneously, but I think it is rather implementing single-value substitutions sequentially, while ordering of this sequence is affected by some lexicographical ordering of the variables for EG1_subs1, and by the respective list orderings in EG1_subs2 and EG1_subs3.

Now for an example where the order-dependence really matters (and the underlying sequential implementation is insufficient for simultaneous replacement), suppose one wants to use .subs() to simply interchange x and y – i.e., substitute every instance of x to y and every instance of y (from the original expression) to x. Repeating the previous example's syntax will not work, and moreover the resulting output from different .subs() syntax variants even changes. Indeed, consider the following syntax variants:

# order-dependent substitution example:
EG2_subs1=expr.subs({y:x,x:y})
EG2_subs2=expr.subs([(y,x),(x,y)])
EG2_subs3=expr.subs([(x,y),(y,x)])

# to compare the results:
for j in [EG2_subs1,EG2_subs2,EG2_subs3]:
    print(j)

Notice that EG2_subs2 and EG2_subs3 are not the same, clearly showing how the .subs() method implemented single-value substitutions sequentially.

Somewhat alarming is to notice next that EG2_subs1 also did not carry out a simultaneous substitution (i.e., simply interchanging x and y), but rather implemented single-value substitution sequentially using an order that depends on the variables' labels. Indeed, EG2_subs1==EG2_subs3, which raises the question of why this and not EG2_subs1==EG2_subs2. The answer appears to be that x precedes y in some lexicographical ordering that .subs() is relying on, so, for example, if one happened to use a instead of y in the previous example then the code would alternatively yield EG2_subs1==EG2_subs2 instead. I refer to this as "alarming" only because, perhaps naively, I expected sympy calculations not to depend on which labels one chooses to assign to symbol variables.

The differences between simultaneous and sequential (and therefore order dependent) substitution only arise when original variables are mapped to expressions that contain the original variables. So a 2-step solution to mimic simultaneous substitution would be to first map the original variables to expressions in newly created dummy variables, and then map the dummy variables back the original variables. Here is an example function for such an approach (written only to work for substituting variables, and only using dict formatted subs-data):

def simulSubs(expr,dict):
    # creating labels for dummy variables:
    dummy_labels=['_dummy_'+str(j) for j in dict]
    # initializing dummy variables:
    for j in dummy_labels:
        print(j)
        globals().update({j:symbols(j)})
    # creating dictionaries to go between original and dummy variables:
    varList=[j for j in dict]
    dVarList=[eval('_dummy_'+str(j)) for j in varList]
    var_to_dummy_dict={varList[j]:dVarList[j] for j in range(len(varList))}
    dummy_to_var_dict={dVarList[j]:varList[j] for j in range(len(varList))}
    # reformat dict for the intermediate substitution step:
    intermediate_subs_dict={a:b.subs(var_to_dummy_dict) for a,b in dict.items()}
    # carry out the first subs:
    intermediate_expr=expr.subs(intermediate_subs_dict)
    # carry out the final subs:
    final_expr=intermediate_expr.subs(dummy_to_var_dict)
    # delete dummy variables to clean up namespace:
    for j in set(dummy_labels):
        del globals()[j]
    return final_expr

Indeed, this simulSubs function produces results one expects from simultaneous substitution. E.g., we can use it to interchange variables:

expr_with_var_interchanged=simulSubs(expr,{x:y,y:x})

But is there a more elegant/efficient/standard solution to carry out simultaneous substitution? Is there a standard way to inspect the expression expr, identify all instances of the original variables that we would like to replace, and then replace those instances with their respective assigned replacements?

A final note, in general, a simultaneous substitution is not well defined when the values we are replacing are allowed to be expressions rather than just variables, which .subs() supports, because expressions can be contained within each other. Maybe supporting more general expressions in this way is the reason .subs() has to implement single-value substitutions sequentially.


Solution

  • These issues are discussed here and the simultaneous replacement is already possible in SymPy:

    from sympy.abc import x, y
    >>> (x + 2*y).subs({x:y,y:x}, simultaneous=True)
    2*x + y
    

    There is also a topological_sort routine in sympy.utilities.iterables.