I want to generate in Python all possible RPN (Reverse Polish notation) expressions, that use letters from an input list (such as ['a', 'b', 'c']
) and contain operators ['+', '-', '*', '/']
.
My idea was that we can add elements to the current expression until one of the following happens: either we've used all letters or expression is complete (i.e. we can't add more operators).
So I wrote the following functions:
1)
'''
The function returns True if we can add operator to current expression:
we scan the list and add +1 to counter when we meet a letter
and we add -1 when we meet an operator (it reduces
last two letters into 1 (say ab+ <--> a + b)
'''
def can_add_operator(string_):
n = 0
for letter in string_:
if letter not in ['+', '-', '*', '/']:
n += 1
else:
n -= 1
result = n > 1
return result
can_add_operator('ab*c+')) #False
can_add_operator('ab*c') #True
2)
'''
The function returns a list, that contains operators and
letters, one of which we can add to the current expression.
'''
def possible_elements(items, string_):
#list of all letters that we have not used yet
result = [x for x in items if x not in string_]
if can_add_operator(string_):
result = ["+", "-", "*", "/"] + result
return result
letters = ['a', 'b', 'c', 'd']
possible_elements(letters, 'ab*c') #['+', '-', '*', '/', 'd']
possible_elements(letters, '') #['a', 'b', 'c', 'd']
possible_elements(letters, 'ab*c+d+') #[]
3) Next I wrap it into recursion:
#exp -- current expression, base of recursion is exp = ''
def rec(exp, Final_sol = []):
elements_to_try = possible_elements(letters, exp)
for i in elements_to_try:
if len(possible_elements(letters, exp + i)) == 0:
Final_sol.append(exp + i)
else:
rec(exp+i, Final_sol)
return Final_sol
#we start with an empty string
Final_sol = rec('')
print(len(Final_sol)) #7680
There are two difficulties with that function:
The first one is that if there are repeated letters in list of letters, it won't return all possible results.
For example if letters = ['a', 'b', 'a']
:
Final_sol = rec('')
print(len(Final_sol)) #32
str(Final_sol)
>> "['ab+', 'ab-', 'ab*', 'ab/', 'ba+', 'ba-', 'ba*', 'ba/', 'ba+', 'ba-',
'ba*', 'ba/', 'ab+', 'ab-', 'ab*', 'ab/', 'ab+', 'ab-', 'ab*', 'ab/', 'ba+',
'ba-', 'ba*', 'ba/', 'ba+', 'ba-', 'ba*', 'ba/', 'ab+', 'ab-', 'ab*',
'ab/']"
So the output missing 'ab+a+'
and so on. But I do want to return all
possible combinations in that case too.
The second problem is that there are many "equivalent" strings in the
output. Since we have the commutative and associative
properties in prefix form, expressions like ab+c+
/abc++
/ca+b+
should be considered as equivalent: I want only one of each group in the
output of the function rec().
How could we change the functions above to overcome such difficulties? What is the most elegant solution to problem?
The first one is that if there are repeated letters in list of letters, it won't return all possible results.
We can attack this problem by using a different approach to generate the permutations:
from itertools import permutations
variables = ['a', 'a', 'b', 'c']
operators = ['+', '-', '*', '/']
equations = set()
for permutation in permutations(variables):
a, b, *rest = permutation
operations = permutations(operators)
for permutation in operations:
equation = zip([a + b, *rest], permutation)
equations.add("".join(variable + operator for variable, operator in equation))
Using a set()
will eliminate any duplications caused by repeated variables.
The second problem is that there are many "equivalent" strings in the output. Since we have the commutative and associative properties
To deal with the commutative issue, we'll use pattern matching to reduce the equations:
import sys
import re
DEBUG = True
remove = set()
# Reduce commutative equivalents: ca*a-b/ same as ac*a-b/
if DEBUG:
print("Reduce commutative equivalents:", file=sys.stderr)
for equation in equations:
if equation not in remove:
for match in re.finditer(r"(?=(.+)(\w)[+*])", equation):
a, _ = match.span(1)
_, d = match.span(2)
equivalent = equation[:a] + match[2] + match[1] + equation[d:]
if equivalent != equation and equivalent in equations:
remove.add(equivalent)
if DEBUG:
print(f"Removed {equivalent} same as {equation}", file=sys.stderr)
equations -= remove
Because we've built all equations as ab op c op d op, etc. I don't believe we generate the associative equivalents, but if we did, we could use a similar technique to try to thin them out:
remove = set()
# Reduce associative equivalents aa+b*c- same as ab*ab*+c-
if DEBUG:
print("Reduce associative equivalents:", file=sys.stderr)
for equation in equations:
if equation not in remove:
for match in re.finditer(r"(?=(\w)([+])(\w)([*]))", equation):
a, _ = match.span(1)
_, d = match.span(4)
equivalent = equation[:a] + match[3] + match[4] + match[1] + match[3] + match[4] + match[2] + equation[d:]
if equivalent != equation and equivalent in equations:
remove.add(equivalent)
if DEBUG:
print(f"Removed {equivalent} same as {equation}", file=sys.stderr)
equations -= remove
And finally dump out our reduced set:
if DEBUG:
print("Final equations:", file=sys.stderr)
print(equations)
OUTPUT
> python3 test.py
Reduce commutative equivalents:
Removed ac+a-b/ same as ca+a-b/
Removed ab*a/c- same as ba*a/c-
Removed cb*a/a- same as bc*a/a-
Removed ac+b-a/ same as ca+b-a/
Removed ba+c/a- same as ab+c/a-
Removed ba+a-c/ same as ab+a-c/
Removed ac+a/b- same as ca+a/b-
Removed ac+b/a- same as ca+b/a-
Removed ac*b-a/ same as ca*b-a/
Removed bc*a-a/ same as cb*a-a/
Removed ca*a-b/ same as ac*a-b/
Removed ba*a-c/ same as ab*a-c/
Removed cb+a/a- same as bc+a/a-
Removed ba+c-a/ same as ab+c-a/
Removed ca*a/b- same as ac*a/b-
Removed ca*b/a- same as ac*b/a-
Removed ba+a/c- same as ab+a/c-
Removed ab*c-a/ same as ba*c-a/
Removed ab*c/a- same as ba*c/a-
Removed cb+a-a/ same as bc+a-a/
Reduce associative equivalents:
Final equations:
{'ca+a-b/', 'cb*a+a-', 'aa/b-c*', 'ba/c-a*', 'cb/a-a*', 'ab+a*c/', 'aa/c+b-',
'bc/a-a+', 'aa*b+c-', 'ba*a/c-', 'ab+c/a*', 'ca-a/b+', 'ca-b+a*', 'bc*a/a-',
'bc/a+a*', 'ac+a/b*', 'bc+a*a-', 'ca/a-b+', 'ac-a*b+', 'ba-a*c/', 'ac/b-a*',
'ba-c+a*', 'ba+a-c*', 'aa+b/c-', 'ca-b*a/', 'ca+b-a/', 'ab+c/a-', 'ac*b+a-',
'aa+c-b/', 'aa*c/b-', 'ab/c*a+', 'ac+b/a*', 'aa+b*c/', 'ab-a*c+', 'ac+a-b*',
'cb-a+a*', 'cb*a/a+', 'ab-c/a+', 'ac*b+a/', 'ba*c/a+', 'ba/c+a*', 'aa-b*c+',
'aa/b+c*', 'ab-c*a+', 'ac+a*b/', 'ac/b+a-', 'aa*b-c+', 'ac-a+b/', 'aa-c*b+',
'ab+a-c/', 'aa-c+b/', 'ba+c*a/', 'ca-b*a+', 'ab-a/c*', 'aa-b/c+', 'ac*a+b/',
'ba/a+c-', 'ba-c/a+', 'cb/a+a*', 'ca+b/a*', 'aa/c*b+', 'ac-a+b*', 'ba-a+c*',
'ca+a*b/', 'aa+b/c*', 'aa/c-b+', 'bc*a/a+', 'ca+a/b-', 'ca+b/a-', 'ca*b-a/',
'ac/b*a-', 'aa*b/c+', 'ba/a*c+', 'bc/a*a+', 'ca-b+a/', 'ac/b+a*', 'aa*b/c-',
'bc-a+a/', 'ca/b-a*', 'ba-c*a/', 'cb*a-a/', 'ba-c/a*', 'aa*b+c/', 'ac*a-b/',
'ca*b/a+', 'aa+b-c*', 'ba/a-c*', 'ca-b/a+', 'ab/c-a+', 'cb+a/a*', 'aa-c/b*',
'ba+c*a-', 'cb*a+a/', 'aa*c/b+', 'ab/c+a*', 'ca+b-a*', 'aa+b-c/', 'ac-b*a/',
'ab*a-c/', 'ba-a*c+', 'ba*c+a-', 'bc/a*a-', 'ba*c-a+', 'ba/c*a+', 'ab-c+a/',
'ba*c+a/', 'ca*a-b+', 'bc+a/a-', 'aa+c*b-', 'ab+c*a-', 'ac-a/b+', 'ca+a-b*',
'aa+c-b*', 'ab/c*a-', 'ab+c-a/', 'bc+a/a*', 'ac-a/b*', 'ab/a-c*', 'ac/a-b+',
'bc-a/a+', 'ab+a*c-', 'ac/a-b*', 'ca*a+b-', 'ab/a-c+', 'ab-a*c/', 'cb/a*a-',
'ac/a+b*', 'bc-a/a*', 'ac-b+a*', 'ac*a/b-', 'ba*a+c-', 'ba/a-c+', 'bc/a+a-',
'aa/b-c+', 'cb+a-a*', 'ca-b/a*', 'ca+b*a-', 'ac*b/a-', 'ca-a+b/', 'ca/b*a-',
'ba+a/c*', 'cb-a*a+', 'ac+a*b-', 'aa*b-c/', 'aa*c-b/', 'ac/a*b+', 'aa-c+b*',
'ca*a+b/', 'ca/b+a-', 'ac*a/b+', 'aa+c/b-', 'ab/c+a-', 'ab+a/c-', 'cb-a+a/',
'ab*a-c+', 'ab-a+c*', 'ab+a/c*', 'ac/b-a+', 'ab*c+a/', 'ba/c+a-', 'ba/c*a-',
'cb-a*a/', 'ac+b*a-', 'ba+c-a*', 'ac/b*a+', 'cb/a*a+', 'cb-a/a+', 'bc*a+a/',
'ac*b/a+', 'cb+a*a-', 'ba*c-a/', 'ca-a*b/', 'ca-a*b+', 'ab/a*c-', 'ba-a+c/',
'ba*a/c+', 'bc-a+a*', 'ca+a/b*', 'ca*a/b+', 'aa*c+b-', 'ba*c/a-', 'bc/a-a*',
'ca/a+b*', 'ab-a+c/', 'ca/b*a+', 'ab-a/c+', 'cb*a-a+', 'aa-b/c*', 'ac-b/a+',
'aa*c-b+', 'ab*c+a-', 'cb/a-a+', 'ab/a+c*', 'ba+a*c-', 'ba*a+c/', 'ba-a/c*',
'aa/b+c-', 'ba/c-a+', 'ca/b-a+', 'ab*a/c+', 'bc+a-a*', 'bc*a-a+', 'ab+c*a/',
'ab-c*a/', 'ac*a+b-', 'ca/a+b-', 'ac/a*b-', 'ac+b-a*', 'ba/a+c*', 'ba-a/c+',
'ab*c/a+', 'cb/a+a-', 'ca/a-b*', 'ac-b/a*', 'ab/a*c+', 'ca*b+a/', 'ac-a*b/',
'aa/b*c+', 'aa/c-b*', 'ca/a*b+', 'bc-a*a/', 'ca+b*a/', 'aa*c+b/', 'ab*a+c/',
'bc+a*a/', 'ab-c/a*', 'ca-a+b*', 'aa-c*b/', 'cb-a/a*', 'aa+b*c-', 'ca+a*b-',
'aa-b+c*', 'ac/a+b-', 'ba-c+a/', 'ba-c*a+', 'ca*b-a+', 'ac-b+a/', 'aa-b*c/',
'aa-b+c/', 'ac*a-b+', 'ac+b*a/', 'ca/a*b-', 'bc+a-a/', 'bc-a*a+', 'ba+a*c/',
'ac*b-a+', 'aa/c+b*', 'ab/a+c-', 'ab/c-a*', 'ab-c+a*', 'ba+c/a*', 'ab*c-a+',
'ab+a-c*', 'cb+a*a/', 'ac-b*a+', 'ba/a*c-', 'ab*a+c-', 'ab+c-a*', 'bc*a+a-',
'aa/b*c-', 'ca*b+a-', 'ba*a-c+', 'ca/b+a*', 'aa-c/b+', 'aa+c/b*', 'ca-a/b*',
'aa/c*b-', 'aa+c*b/'}
>
I'm not claiming a perfect solution, just illustrating some of the tools available to you to solve your problem.