Print all objects inside a PDF file with Python

I'd like to list all objects present in a PDF file: text blocks, images, fonts, page objects, but also vector shapes (if any).

I hoped to see all of them with PyMuPDF:

import fitz  # pip install PyMuPDF
doc = fitz.open('test.pdf')
for xref in range(1, doc.xref_length()):
    print(doc.xref_object(xref))

but not everything is there. For example, text is not there. Text can be obtained separately with:

print(doc.load_page(0).get_text('dict'))

but I'm more looking for a general method, rather than one specific for text elements, one for other objects, etc.

Question: how to print all objects present in a PDF file? (text blocks, images, vector shapes, etc.)

Notes:

I've already read How to extract text from a PDF file? and similar questions but this is specific to text, whereas I'm looking for all objects / attributes.
I already read How to open PDF raw? but here it did not help
When opening a PDF with a text editor, we see a lot of human-unreadable binary data (it seems that it is not only for images).

TL;DR: I'm looking for a representation like:

Object0
    TYPE:TEXT
    CONTENT:lorem ipsum
    POSITION:123,123

Object1
    TYPE:IMAGE
    ...

Object2
    TYPE:...
    ...

Solution

Bare with me, please.

This isn't an answer but is really a complex comment in response to the overloaded use of the term "object" not only by the OP and commenters, but also by the PDF spec itself.

PDF is really just JSON on steroids

PDF has first-class support for booleans, integers, real numbers, strings, names, arrays, dictionaries, streams, and a singleton null object. But instead of describing the document as one giant dictionary, PDF allows defining objects with an object-id and referencing it later by the object-id. These are called indirect objects. The PDF document is actually just a bag of objects, with an index and pointer to the "root" object at the tail of the file.

INDIRECT OBJECTS

These objects in the PDF that have an object-id is what is typically meant by the informal use of the term objects in a PDF. These are used to describe the structure of the document and all the resources that are needed to produce the document. However these objects hold none of the actual content.

STREAMS hold the content

Streams are used to hold a small postfix-based command language that is interpreted by the PDF viewer. Here is an example from https://brendanzagaeski.appspot.com/0004.html showing an actual valid snippet of PDF that shows an indirect object with object-id 4 and of type stream. My comments on the right.

4 0 obj                 begin indirect object 4
  << /Length 55 >>      { 'Length': 55}
stream                  begin stream type
  BT                        begin-text-object command
    /F1 18 Tf               change-font to font with descriptor F1 at size 18pt
    0 0 Td                  position-text at x=0, y=0
    (Hello World) Tj        render-text "Hello World"
  ET                        end-text-object command
endstream               end stream type
endobj                  end object

GRAPHIC OBJECTS - the twist in the knickers

The PDF spec refers to all of the elements instantiated by commands inside of a stream as "graphic objects". Yes even text objects are graphics objects. However these objects aren't declared with properties, they are defined by instructions on how to build them with an overarching state machine as shown below.

THE PAIN

So the twist, if you want all the graphics objects in the following form:

{ 'content': [
    { 'type': 'text', 'position': [0,0], 'text': "Hello World"
]}

you have to build an interpreter to keep track of the graphics state and store away the objects as they get created when the commands are executed by the interpreter. A basic PDF viewer doesn't have to do this because the interpreter maps closely to the graphics api and the graphics state held by the graphics layer.

So when you say objects...

Do you mean:

Indirect objects
The document catalog in JSON format
All the graphics objects
All of the above

References

All images came out of the PDF specification

https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf