Infix form supporting negative numbers

I am trying to solve a problem namely:

Instructions

Given a mathematical expression as a string you must return the result as a number.

Numbers

Number may be both whole numbers and/or decimal numbers. The same goes for the returned result.

Operators

You need to support the following mathematical operators:

Multiplication *
Division /
Addition +
Subtraction -
Operators are always evaluated from left-to-right, and * and / must be evaluated before + and -.

Parentheses

You need to support multiple levels of nested parentheses, ex. (2 / (2 + 3.33) * 4) - -6

Whitespace

There may or may not be whitespace between numbers and operators.

An addition to this rule is that the minus sign (-) used for negating numbers and parentheses will never be separated by whitespace. I.e., all of the following are valid expressions.

1-1    // 0
1 -1   // 0
1- 1   // 0
1 - 1  // 0
1- -1  // 2
1 - -1 // 2

6 + -(4)   // 2
6 + -( -4) // 10
And the following are invalid expressions

1 - - 1    // Invalid
1- - 1     // Invalid
6 + - (4)  // Invalid
6 + -(- 4) // Invalid
Validation

Thus making '2 /2+3 * 4.75- -6' a valid expression. I have been able to code the polish form for expressions which don't respect the whitespaces but don't give negative numbers. I think I can solve the problem for negative numbers if they respect the whitespaces. My problem is how to actually tokenize the input expression if the whitespaces are not respected and negative numbers are given. This is my algorithm so far:

def is_operator? s
  operators = ['*', '/', '+', '-']
  operators.include?(s)
end

def is_operand? s
    !(s =~ /^[0-9.]+$/).nil?
end

def priority op
  case op
    when "(" , ")" then 0
    when "/", "*" then 2
    else 1
  end
end

def eval(lt,rt,op)
  case op
    when '+' then lt.to_f + rt.to_f 
    when '-' then lt.to_f - rt.to_f  
    when '*' then lt.to_f * rt.to_f  
    when '/' then lt.to_f / rt.to_f  
  end
end

def indent_string s
    s.gsub(/[^[0-9.]]/) { |m| " #{m} "}.split(" ")
end

def create_polish s
    stack = Array.new()
    array = indent_string s
    fpp = ""
    array.each do |item|
        if is_operand? item
            fpp = fpp + item + " "
        else
            if item == '('
                stack << item
            else if is_operator? item
                    while stack.any? && ( priority(stack[-1]) >= priority(item) )
                        fpp = fpp + stack.pop + " "
                    end
                    stack << item
                 else
                    while stack.any? && !(stack[-1] == '(' )
                        fpp = fpp + stack.pop + " "
                    end
                    stack.pop
                 end
            end
        end
    end
    while stack.any?
        fpp = fpp + stack.pop + " "
    end
    fpp
end

def solve_polish s
  stack = Array.new()
  s.split(" ").each do |item|
    unless is_operator? item 
      stack << item
    else
      elements = stack.pop(2)
      stack << eval(elements[0], elements[1], item)
    end
  end
  puts stack.pop
end

solve_polish(create_polish '(5 + 2) * 9 - 10 + ( 7 * (2 + 3) ) - 3 * (2)')

it solves non-negative expressions that don't respect the whitespace rule because I made indent_string method which puts a space before and after each operator and then I just split the string to get the tokens. This way I lose negative numbers sadly. Any ideas?

UPDATE 1: Having thought about it I would need a regex that puts whitespaces front and back if there is no other operator behind him. So '2- -2' would get transformed in '2 - -2' because the second '-' has a '-' preceeding him instead of another number.

Solution

You have parsing code which, essentially, loops through the symbols and tries to recognize each token. The 'stack' part of the expression processing is sophisticated, but the way you recognize each token is very simple.

You need to adapt this tokenizing to use a 'state machine'. At each point, depending on the current state of the machine, you expect a different set of possible next tokens. You use the current set of possible next tokens to help you recognize what the next token is. Each token which you successfully recognize, might also have the consequences of changing the state of the machine. If the next token cannot be any of the possible tokens based on your current state, you have a parse error.

Luckily, your case is just about the simplest possible. You might want to do something like this: Start in EXPRESSION_EXPECTED state. You want to read an opening bracket or a number only. If you read a single 'number', go to OPERATOR_EXPECTED state. If you read an opening bracket, read the whole inner expression recursively. When you get to a closing bracket, go to OPERATOR_EXPECTED state.

Now, when you are in OPERATOR_EXPECTED state, the only thing you are happy to read is an operator. As soon as you have read one, you go back into EXPRESSION_EXPECTED state. (Here an operator means a binary operator, not unary minus.)

You can probably test this scheme out, without worrying about minus numbers, and make sure that you can parse the same things that your code currently parses.

Now, if you are in OPERATOR_EXPECTED, a minus sign means 'subtract' and is an operator. If you are in EXPRESSION_EXPECTED, a minus sign means 'minus' and is the first part of reading a number.

This concept is essential to parsing. Your problem was not expressed in BNF, a standard language for describing syntax, but BNF works well with finite state machines. There is also lots and lots of computer science theory about this stuff, some of it complicated, but much of it accessible.