I am trying to solve a problem namely:
Instructions
Given a mathematical expression as a string you must return the result as a number.
Numbers
Number may be both whole numbers and/or decimal numbers. The same goes for the returned result.
Operators
You need to support the following mathematical operators:
Multiplication *
Division /
Addition +
Subtraction -
Operators are always evaluated from left-to-right, and * and / must be evaluated before + and -.
Parentheses
You need to support multiple levels of nested parentheses, ex. (2 / (2 + 3.33) * 4) - -6
Whitespace
There may or may not be whitespace between numbers and operators.
An addition to this rule is that the minus sign (-) used for negating numbers and parentheses will never be separated by whitespace. I.e., all of the following are valid expressions.
1-1 // 0
1 -1 // 0
1- 1 // 0
1 - 1 // 0
1- -1 // 2
1 - -1 // 2
6 + -(4) // 2
6 + -( -4) // 10
And the following are invalid expressions
1 - - 1 // Invalid
1- - 1 // Invalid
6 + - (4) // Invalid
6 + -(- 4) // Invalid
Validation
Thus making '2 /2+3 * 4.75- -6'
a valid expression. I have been able to code the polish form for expressions which don't respect the whitespaces but don't give negative numbers. I think I can solve the problem for negative numbers if they respect the whitespaces. My problem is how to actually tokenize the input expression if the whitespaces are not respected and negative numbers are given. This is my algorithm so far:
def is_operator? s
operators = ['*', '/', '+', '-']
operators.include?(s)
end
def is_operand? s
!(s =~ /^[0-9.]+$/).nil?
end
def priority op
case op
when "(" , ")" then 0
when "/", "*" then 2
else 1
end
end
def eval(lt,rt,op)
case op
when '+' then lt.to_f + rt.to_f
when '-' then lt.to_f - rt.to_f
when '*' then lt.to_f * rt.to_f
when '/' then lt.to_f / rt.to_f
end
end
def indent_string s
s.gsub(/[^[0-9.]]/) { |m| " #{m} "}.split(" ")
end
def create_polish s
stack = Array.new()
array = indent_string s
fpp = ""
array.each do |item|
if is_operand? item
fpp = fpp + item + " "
else
if item == '('
stack << item
else if is_operator? item
while stack.any? && ( priority(stack[-1]) >= priority(item) )
fpp = fpp + stack.pop + " "
end
stack << item
else
while stack.any? && !(stack[-1] == '(' )
fpp = fpp + stack.pop + " "
end
stack.pop
end
end
end
end
while stack.any?
fpp = fpp + stack.pop + " "
end
fpp
end
def solve_polish s
stack = Array.new()
s.split(" ").each do |item|
unless is_operator? item
stack << item
else
elements = stack.pop(2)
stack << eval(elements[0], elements[1], item)
end
end
puts stack.pop
end
solve_polish(create_polish '(5 + 2) * 9 - 10 + ( 7 * (2 + 3) ) - 3 * (2)')
it solves non-negative expressions that don't respect the whitespace rule because I made indent_string method which puts a space before and after each operator and then I just split the string to get the tokens. This way I lose negative numbers sadly. Any ideas?
UPDATE 1: Having thought about it I would need a regex that puts whitespaces front and back if there is no other operator behind him. So '2- -2' would get transformed in '2 - -2' because the second '-' has a '-' preceeding him instead of another number.
You have parsing code which, essentially, loops through the symbols and tries to recognize each token. The 'stack' part of the expression processing is sophisticated, but the way you recognize each token is very simple.
You need to adapt this tokenizing to use a 'state machine'. At each point, depending on the current state of the machine, you expect a different set of possible next tokens. You use the current set of possible next tokens to help you recognize what the next token is. Each token which you successfully recognize, might also have the consequences of changing the state of the machine. If the next token cannot be any of the possible tokens based on your current state, you have a parse error.
Luckily, your case is just about the simplest possible. You might want to do something like this: Start in EXPRESSION_EXPECTED
state. You want to read an opening bracket or a number only. If you read a single 'number', go to OPERATOR_EXPECTED
state. If you read an opening bracket, read the whole inner expression recursively. When you get to a closing bracket, go to OPERATOR_EXPECTED
state.
Now, when you are in OPERATOR_EXPECTED
state, the only thing you are happy to read is an operator. As soon as you have read one, you go back into EXPRESSION_EXPECTED
state. (Here an operator means a binary operator, not unary minus.)
You can probably test this scheme out, without worrying about minus numbers, and make sure that you can parse the same things that your code currently parses.
Now, if you are in OPERATOR_EXPECTED
, a minus sign means 'subtract' and is an operator. If you are in EXPRESSION_EXPECTED
, a minus sign means 'minus' and is the first part of reading a number.
This concept is essential to parsing. Your problem was not expressed in BNF, a standard language for describing syntax, but BNF works well with finite state machines. There is also lots and lots of computer science theory about this stuff, some of it complicated, but much of it accessible.