Search code examples
pythonocrtesseract

How to format Python variable from table output before insert into a database


I have a variable where the text keyword was extracted from the table form.

Can someone suggest how I can format it properly before inserting it into a database?

enter image description here

Below is the code and output of the variable:

output = pytesseract.image_to_string(image)
print(output)

Result from output =

1) JP *00000.0000/UNT 0.07704 61628.21 0%(E) 0.00 ND

Solution

  • You could first split your string, then convert everything decimal to a float.

    output='1) JP *00000.0000/UNT 0.07704 61628.21 0%(E) 0.00 ND'
    l = output.split()
    for idx, le in enumerate(l):
        try:
            l[idx] = float(l[idx])
        except:
            continue
    

    l is a list

    ['1)', 'JP', '*00000.0000/UNT', 0.07704, 61628.21, '0%(E)', 0.0, 'ND']