Search code examples
pythonalgorithmstringtokenizer

How to tokenize a string (which has data about mathematical calculations and floating point numbers)?


I'm trying to tokenize a string (which has data about mathematical calculations) and create a list.

for example,

a = "(3.43 + 2^2 / 4)"

function(a) => ['(', '3.43', '+', '2', '^', '2', '/', '4']

I don't want to use external imports (like nltk).

The problem I'm facing is keeping the floating point numbers intact.

I've been scratching my head for hours and have made 2 functions, but the problem occurs when it confronts floating point numbers.

Here is what I've done:

a = "(3.43 + 2^2 / 4)"
tokens = []

for x in range(1, len(a)-1):
no = []

if a[x] == ".":
    y = x
    no.append(".")

    while is_int(a[y-1]):
        no.insert(0, a[y-1])
        y -= 1

    y = x

    while is_int(a[y+1]):
        no.extend(a[y+1])
        y += 1

    token = "".join(no)
    no = []
    tokens.append(token)

else:
    tokens.append(a[x])

print(tokens)

OUTPUT:

['3', '3.43', '4', '3', ' ', '+', ' ', '2', '^', '2', ' ', '/', ' ', '4']

Solution

  • Try this

    a = "(3.43 + 2^2 / 4)"
    tokens = []
    no = ""
    
    for x in range(0, len(a)):
        # Skip spaces
        if a[x] == " ":
            pass
        # Concatenate digits or '.' to number
        elif a[x].isdigit() or (a[x] == "."):
            no += a[x]
        # Other token: append number if any, and then token
        else:
            if no != "":
                tokens.append(no)
            tokens.append(a[x])
            no = ""
    
    print(tokens)
    

    Output:

    ['(', '3.43', '+', '2', '^', '2', '/', '4', ')']
    

    Note, this won't handle operators that are more than one character, such as ==, **, +=