How to convert text extracted from tesseract to pandas dataframe

This is the text I extracted from a cropped image containing table:

S No PART CODE PART DESCRIPTION

HSN

QTY RATE(Rs)

VALUE DISCOUNT SGST SGST%

CGST CGST%

AMOUNT(Rs)

CHAIN LUBE &

CLEANER KIT-

34039900

0.16

1,406.78 213.5648

11.52

19.22

252.00

3600008

S00ML.

141715

BULB 12V-2VW(BA9S)

85392940

10.17

0.92

12.01

(PARKING)

20.14

264.01

TOTAL

223.73

11.52

20.14

0.01

ROUND OFF

TOTAL

264

This is the image

I want to convert this into pandas dataframe. How should I do it?

df = pytesseract.image_to_data('1.jpg', lang='eng', output_type='data.frame')
display(df)

Solution

You will need to specify output_type='data.frame'.

from PIL import Image
import pytesseract

df = pytesseract.image_to_data(Image.open('your_image.jpeg'),lang='eng',output_type='data.frame')