Search code examples
pythonpngjpegconverters

Convert png/jpg to word file using python


I need to convert lots of jpg/png files to docx files & then to pdf. My sole concern is to write the data in an image to a pdf file & if I need to edit any text manually, I can do that in word & save it in the corresponding pdf file.

I've tried using API but failed as the text is not correctly matching.

My image files contain only texts & not anything else.

I already have docx to pdf conversion code in Python.

from docx2pdf import convert

input = 'INPUT_FILE_NAME.docx'
output = 'OUTPUT_FILE_NAME.pdf'
convert(input)
convert(input, output)
convert("Output")

Kindly suggest me how to convert a png/jpg file to docx. Thanks.

EDIT --------------

I've successfully made this code run. I've uploaded in my github repo.


Solution

  • from PIL import Image
    from pytesseract import pytesseract
    
    #Define path to tessaract.exe
    path_to_tesseract = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
    
    
    #Define path to image
    path_to_image = 'texttoimage.png'
    
    #Point tessaract_cmd to tessaract.exe
    pytesseract.tesseract_cmd = path_to_tesseract
    
    #Open image with PIL
    img = Image.open(path_to_image)
    
    #Extract text from image
    text = pytesseract.image_to_string(img)
    
    print(text)