Search code examples
pythonpandastesseractpython-tesseract

How to convert text extracted from tesseract to pandas dataframe


This is the text I extracted from a cropped image containing table:

S No PART CODE PART DESCRIPTION

HSN

QTY RATE(Rs)

VALUE DISCOUNT SGST SGST%

CGST CGST%

AMOUNT(Rs)

CHAIN LUBE &

CLEANER KIT-

34039900

0.16

1,406.78 213.5648

11.52

19.22

19.22

9

252.00

1

3600008

S00ML.

141715

BULB 12V-2VW(BA9S)

85392940

4

10.17

10.17

0

0.92

0.92

9

12.01

2)

(PARKING)

20.14

18

264.01

TOTAL

223.73

11.52

20.14

18

0.01

ROUND OFF

TOTAL

264

This is the image Table

I want to convert this into pandas dataframe. How should I do it?

df = pytesseract.image_to_data('1.jpg', lang='eng', output_type='data.frame')
display(df)

Solution

  • You will need to specify output_type='data.frame'.

    from PIL import Image
    import pytesseract
    
    df = pytesseract.image_to_data(Image.open('your_image.jpeg'),lang='eng',output_type='data.frame')