Search code examples
pythonpdffontsfont-size

How to find the Font Size of every paragraph of PDF file using python code?


Right now i am Working on a project in which i have to find the font size of every paragraph in that PDF file. i have tried various python libraries like fitz, PyPDF2, pdfrw, pdfminer, pdfreader. all the libraries fetch the text data but i don't know how to fetch the font size of the paragraphs. thanks in advance..your help is appreciated.

i have tried this but failed to get font size.

import fitz

filepath = '/home/user/Downloads/abc.pdf'
text = ''
with fitz.open(filepath ) as doc:
    for page in doc:
        text+= page.getText()
print(text)

Solution

  • I got the solution from pdfminer. The python code for the same is given below.

    from pdfminer.high_level import extract_pages
    from pdfminer.layout import LTTextContainer, LTChar,LTLine,LAParams
    import os
    path=r'/path/to/pdf'
    
    Extract_Data=[]
    
    for page_layout in extract_pages(path):
        for element in page_layout:
            if isinstance(element, LTTextContainer):
                for text_line in element:
                    for character in text_line:
                        if isinstance(character, LTChar):
                            Font_size=character.size
                Extract_Data.append([Font_size,(element.get_text())])