Search code examples
pythonpython-3.xms-wordpypdf

Can I check a PDF document for images with a certain marker (alt-text / some other marker I can make in Word) and remove all images with that marker?


I found questions about deleting all images, but can't find anything specific to what I need.

Background:

I'm making answer keys for a math course book. I have a Word document of the book, but don't want to update the answer key each time I update the course book (change order of questions / formatting / ...). I want to write all the answers in the course book directly, and then save as PDF both WITH and WITHOUT the answers.

For text I can do this (create new style --> write all answers with this style selected --> modify style colour to white if I don't want to print the answers). However, some questions have to be solved by drawing. I either can make these drawings again in Word, or I would insert an image (scan of the drawing on paper).

My question:

Mark the images that need to toggle show/don't show

So, I need a way to mark certain images with some kind of tag. I thought maybe the Alt-Text could be useful, but I can't find a way to get the Alt-Text of images in a PDF with Python. But if there is any other kind of marker/tag that I can place on the images that can be read, that's fine too.

Print the document either with or without the images with that marker

Then, I want to have 2 PDF documents: one without those specific images (student version), and one with those images (answer key). I don't mind if it is done in Word directly on the .docx file, or after the fact on the PDF file (preferably with Python)


Solution

  • EDIT: I have found a much better solution that does rely on the Alt-Text, which is with a VBA script. See below for details.

    Old method I used

    I have found a solution to my specific problem, but have to preface it by saying it is not about the Alt-Text of the PDF file.

    Thanks to a Reddit post suggesting to ask ChatGPT, because the AI actually did give me an idea about using Text Boxes in Word, which I then adapted in the following way:

    Setup

    1. Create a Style, I'll call it "AnswerTextBox"
    2. Create a Text Box where you want to have a toggleable image (can be both in line or free to move)
    3. Set the Text Box fill and border to "no colour"
    4. Paste the image inside the Text Box
    5. Apply the "AnswerTextBox" Style to the Text Box

    To toggle the images

    1. Be anywhere in the document
    2. Right click the "AnswerTextBox" Style
    3. Modify
    4. Select "Format" and "Paragraph"
    5. Set the "Line Spacing" option to "Exactly"
    6. Type the lowest possible value (0.7 pt for me)
    7. Select OK
    8. The images disappear!
    9. (To show the images again you just set the "Line Spacing" option back to "Single")

    EDIT: Here is the new method I found

    1. Decide on a keyword. I will use [Keyword]
    2. Simply enter [Keyword] into the Alt-Text of a Figure / Drawing / Text Box / Shape
    3. Run this VBA:
    Sub Hide_Answer_Key()
        For Each shp In ActiveDocument.Shapes
            If shp.AlternativeText = "Keyword" Then
                shp.Visible = msoFalse ' Makes the shape invisible '
            End If
        Next shp
    End Sub
    
    1. All the Shapes with the [Keyword] are now hidden. To unhide them again, run a second script that's exactly the same except change msoFalse to msoTrue.

    Caveats:

    1. Only the highest level of Shapes will be hidden. This means that if you group some shapes together, you must give the group itself the Alt-Text.
    2. When you hide shapes, the space they take up will also collapse. So make sure you either Wrap the Shapes in front of or behind text, so that the text will not be affected when the Shape is hidden.

    Benefits:

    1. No need to rely on Text Styles.
    2. No need for Text Boxes.
    3. If your image is a Drawing made of Shapes inside your document, you can edit the Drawing directly without needing to copy the Drawing into your Text Box again.