Search code examples
windowspdfesp32

Is there a way to obtain Source Code of a PDF file in Windows?


I've been looking a way to obtain the source code of a PDF file, not the HEX code but a plain text code, my intention is to code a PDF file from plain text, that way I can create a PDF report with a ESP32 or maybe an Arduino board, uploading the source code to a program, save it to an SD card and rename it with a .pdf extension.

I know it's more complicated than just add lines and Strings like you would do with an HTML document. If I add or delete an object the file will be corrupted, but the plan is to generate a "PDF Layout just like this one:

PDF Layout Example
PDF Layout Table Example

That way I wouldn't be deleting or adding any objects, just modifying the String that already exists. I found I can generate PDF files from a text editor like NotePad using plain text like this example:

    %PDF-1.4
1 0 obj
  << /Type /Catalog
      /Outlines 2 0 R
      /Pages 3 0 R
  >>
endobj

2 0 obj
  << /Type /Outlines
      /Count 0
  >>
endobj

3 0 obj
  << /Type /Pages
      /Kids [ 4 0 R ]
      /Count 1
  >>
endobj

4 0 obj
  << /Type /Page
      /Parent 3 0 R
      /MediaBox [ 0 0 612 792 ]
      /Contents 5 0 R
      /Resources << /ProcSet 6 0 R
      /Font << /F1 7 0 R >>
  >>
>>
endobj

5 0 obj
  << /Length 73 >>
stream
  BT
    /F1 24 Tf
    100 100 Td
    ( Hello World ) Tj
  ET
endstream
endobj

6 0 obj
  [ /PDF /Text ]
endobj

7 0 obj
  << /Type /Font
    /Subtype /Type1
    /Name /F1
    /BaseFont /Helvetica
    /Encoding /MacRomanEncoding
  >>
endobj

xref
0 8
0000000000 65535 f
0000000009 00000 n
0000000074 00000 n
0000000120 00000 n
0000000179 00000 n
0000000364 00000 n
0000000466 00000 n
0000000496 00000 n

trailer
  << /Size 8
    /Root 1 0 R
  >>
startxref
625
%%EOF

So I've been searching a way to extract that kind of code from my PDF layout but I've been only capable of extracting the HEX code which is kind of useless for my purpose. I would be grateful on any help or guidance on this project.


Solution

  • For what you propose one potential solution is MuPDF/MuTool If you wish to decompile An existing PDF there are options in MuPDF-GL for windows using option A to convert to Ascii and "PrettyPrint"

    You can write your own PDF as text but it can have limitations this is accepted as a working PDF

    %PDF-1.2 4 0 obj << >> stream BT/ 36 Tf((Hello World!))' ET endstream endobj 3 0 obj << /Type /Page /Parent 2 0 R /Contents 4 0 R >> endobj 2 0 obj << /Kids [3 0 R ] /Count 1 /Type /Pages /MediaBox [ -195 -442 400 400 ] >> endobj 1 0 obj << /Pages 2 0 R /Type /Catalog >> endobj trailer << /Root 1 0 R > %%EOF
    

    courtesy of Thomas see Create Memorystream of type pdf and return to browser

    If you are "Hand balling" with UTF 16 chars on a "small device" it becomes a step harder see https://stackoverflow.com/a/68442444/10802527

    More useful to producing your own many RaspberryPi users Compile PDF via MuTool Create https://mupdf.readthedocs.io/en/latest/mutool-create.html

    The Input Text to be translated during compilation is much simpler especially for image handling

    %%MediaBox 0 0 612 792
    %%Font TmRm Times-Roman
    %%Font Helv-C Helvetica Cyrillic
    %%Font Helv-G Helvetica Greek
    %%Image I0 logo/ClientLogo.png
    
    % Draw the image.
    q
    480 0 0 480 50 250 cm
    /I0 Do
    Q
    
    % Draw a triangle. (Can be rectangles or a grid etc)
    q
    1 0 0 rg
    50 50 m
    100 200 l
    200 50 l
    f
    Q
    
    % Show some text. (Remember we humans work downwards, so 50 in then 760,730,700, etc. downwards)
    q
    0 0 1 rg
    BT /TmRm 24 Tf 50 760 Td (Hello, from EPS32!) Tj ET
    BT /Helv-C 24 Tf 50 730 Td <fac4d2c1d7d3d4d7d5cad4c521> Tj ET
    BT /Helv-G 24 Tf 50 700 Td ( I am Line 3) Tj ET
    Q
    

    enter image description here

    where that png background is just 9 pixels which as seen as text could be under 1Kb actually 227 bytes enter image description here