Search code examples
pythonpowerpointpython-pptx

Extract hyperlink from pptx


I want to extract the hyperlink from pptx, I know how to do it in word, but anyone knows how to extract it from pptx?

For example, I have a text below in pptx and I want to get the url https://stackoverflow.com/ :


Hello, stackoverflow


I tried to write the Python code to get the text:

from pptx import Presentation
from pptx.opc.constants import RELATIONSHIP_TYPE as RT

ppt = Presentation('data/ppt.pptx')

for i, sld in enumerate(ppt.slides, start=1):
    print(f'-- {i} --')
    for shp in sld.shapes:
        if shp.has_text_frame:
            print(shp.text)

But I just want to print the text and the URL when the text with hyperlink.


Solution

  • In python-pptx, a hyperlink can appear on a Run, which I believe is what you're after. Note that this means zero-or-more hyperlinks can appear in a given shape. Note also that a hyperlink can also appear on an overall shape, such that clicking on the shape follows the link. In that case, the text of the URL does not appear.

    from pptx import Presentation
    
    prs = Presentation('data/ppt.pptx')
    
    for slide in prs.slides:
        for shape in slide.shapes:
            if not shape.has_text_frame:
                continue
            for paragraph in shape.text_frame.paragraphs:
                for run in paragraph.runs:
                    address = run.hyperlink.address
                    if address is None:
                        continue
                    print(address)
    

    The relevant sections of the documentation are here:
    https://python-pptx.readthedocs.io/en/latest/api/text.html#run-objects

    and here:
    https://python-pptx.readthedocs.io/en/latest/api/action.html#hyperlink-objects