I'm trying to tokenize a string (which has data about mathematical calculations) and create a list.
for example,
a = "(3.43 + 2^2 / 4)"
function(a) => ['(', '3.43', '+', '2', '^', '2', '/', '4']
I don't want to use external imports (like nltk).
The problem I'm facing is keeping the floating point numbers intact.
I've been scratching my head for hours and have made 2 functions, but the problem occurs when it confronts floating point numbers.
Here is what I've done:
a = "(3.43 + 2^2 / 4)"
tokens = []
for x in range(1, len(a)-1):
no = []
if a[x] == ".":
y = x
no.append(".")
while is_int(a[y-1]):
no.insert(0, a[y-1])
y -= 1
y = x
while is_int(a[y+1]):
no.extend(a[y+1])
y += 1
token = "".join(no)
no = []
tokens.append(token)
else:
tokens.append(a[x])
print(tokens)
OUTPUT:
['3', '3.43', '4', '3', ' ', '+', ' ', '2', '^', '2', ' ', '/', ' ', '4']
Try this
a = "(3.43 + 2^2 / 4)"
tokens = []
no = ""
for x in range(0, len(a)):
# Skip spaces
if a[x] == " ":
pass
# Concatenate digits or '.' to number
elif a[x].isdigit() or (a[x] == "."):
no += a[x]
# Other token: append number if any, and then token
else:
if no != "":
tokens.append(no)
tokens.append(a[x])
no = ""
print(tokens)
Output:
['(', '3.43', '+', '2', '^', '2', '/', '4', ')']
Note, this won't handle operators that are more than one character, such as ==, **, +=