Search code examples
pdfacrobatcoreldraw

Trying to open a manually created PDF in CorelDRAW but failing


Update and solved: I originally thought my problem was a more general question, but after @KJ checked my PDF file, it was actually a mistake in my own file. So the solution in the comments is specific to my personal situation.

A suggestion for me and for others who have similar situations in the future: check the structure carefully. The pointer values must be 100% correct; some errors might cause problems for some pdf readers.

original question:

I am trying to create PDF files manually as export functionality of my project. It almost works. The files can be opened with Adobe Acrobat and Illustrator and are completely editable. But this doesn't work for CorelDRAW.

What I assumed:

When opening it with Adobe Acrobat, it always asks me to save even if there is no change. And after saving, the files can be successfully opened in CorelDRAW. So I thought the Arcobat did something to repair my files. I want to know what it did, and is it possible to create a PDF that is absolutely compliant with the standard of Arobat and thus can be opened in CDR directly.

I don't have enough background on this. Many thanks in advance for any help, comments, and suggestions.

Files associated (I embedded font files into the PDF file, so I don't want to copy the code here; it will be very long.):

Here is the manually created PDF

Here is the PDF after saving within Acrobat

Alternatively, here is a similar case:

Raw pdf manually created, which cannot be read by CDR:

%PDF-1.4
<hex chars removed>
1 0 obj
<< /Pages 2 0 R /Type /Catalog >>
endobj
2 0 obj
<< /Count 1 /Kids [ 3 0 R ] /Type /Pages >>
endobj
3 0 obj
<< /Contents 4 0 R /MediaBox [ 0 0 500 800 ] /Parent 2 0 R /Resources 5 0 R /Type /Page >>
endobj
4 0 obj
<< /Length 57 >>
stream
BT /F1 24 Tf 175 720 Td <FEFF004821260065006C006C006F> Tj ET
endstream
endobj
5 0 obj
<< /Font << /F1 6 0 R >> >>
endobj
6 0 obj
<< /BaseFont /Courier /Subtype /Type1 /Type /Font >>
endobj
xref
0 7
0000000000 65535 f 
0000000015 00000 n 
0000000064 00000 n 
0000000123 00000 n 
0000000229 00000 n 
0000000335 00000 n 
0000000378 00000 n 
trailer << /Root 1 0 R /Size 7 /ID [<89311a609a751f1666063e6962e79bd5><89311a609a751f1666063e6962e79bd5>] >>
startxref
448
%%EOF

After saving within Acrobat:

%PDF-1.6
%忏嫌
7 0 obj
<</Linearized 1/L 4686/O 9/E 1039/N 1/T 4397/H [ 443 130]>>
endobj
                       
12 0 obj
<</DecodeParms<</Columns 4/Predictor 12>>/Filter/FlateDecode/ID[<89311A609A751F1666063E6962E79BD5><1AF678699D64704CBB4D6708F363F3FE>]/Index[7 9]/Info 6 0 R/Length 47/Prev 4398/Root 8 0 R/Size 16/Type/XRef/W[1 2 1]>>stream
h辀bd``b`?6 ?H0.?012L?02齡茗 ? \N?
endstream
endobj
startxref
0
%%EOF
      
15 0 obj
<</Filter/FlateDecode/I 66/Length 51/S 38>>stream
h辀```f`` 層P#?p4 ?C1C泂LL`
T?€  狵?
endstream
endobj
8 0 obj
<</Metadata 1 0 R/Pages 5 0 R/Type/Catalog>>
endobj
9 0 obj
<</Contents 11 0 R/CropBox[0 0 500 800]/MediaBox[0 0 500 800]/Parent 5 0 R/Resources 13 0 R/Rotate 0/Type/Page>>
endobj
10 0 obj
<</Filter/FlateDecode/First 11/Length 72/N 2/Type/ObjStm>>stream
h?4V0P04Q02V氨褀讼+Q?!?;  驖婼ARE櫓E%?!@?L偟谫 :Uk
endstream
endobj
11 0 obj
<</Length 62>>stream
BT /F1 24 Tf 175 720 Td <FEFF004821260065006C006C006F> Tj ET

endstream
endobj
1 0 obj
<</Length 2988/Subtype/XML/Type/Metadata>>stream
<?xpacket begin="锘? id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c017 91.164464, 2020/06/15-10:20:05        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:xmp="http://ns.adobe.com/xap/1.0/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
         <xmp:ModifyDate>2024-02-16T11:43:39+01:00</xmp:ModifyDate>
         <xmp:CreateDate>2024-02-16T11:43:39+01:00</xmp:CreateDate>
         <xmp:MetadataDate>2024-02-16T11:43:39+01:00</xmp:MetadataDate>
         <dc:format>application/pdf</dc:format>
         <xmpMM:DocumentID>uuid:51548485-b977-470f-8fa3-e46b5a112535</xmpMM:DocumentID>
         <xmpMM:InstanceID>uuid:8543361a-d2ca-46bb-8d6c-d303ebd0decf</xmpMM:InstanceID>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                           
<?xpacket end="w"?>
endstream
endobj
2 0 obj
<</Filter/FlateDecode/First 4/Length 48/N 1/Type/ObjStm>>stream
h?U0P氨褀??Q0憎蜭)幎?抨嘥り $Η圪 謜€
endstream
endobj
3 0 obj
<</Filter/FlateDecode/First 4/Length 62/N 1/Type/ObjStm>>stream
h?S0P氨褀.JM,商蟬I,I誴?202102434416对60T70P自魍O莲牢 ? &@?
endstream
endobj
4 0 obj
<</DecodeParms<</Columns 3/Predictor 12>>/Filter/FlateDecode/ID[<89311A609A751F1666063E6962E79BD5><1AF678699D64704CBB4D6708F363F3FE>]/Info 6 0 R/Length 37/Root 8 0 R/Size 7/Type/XRef/W[1 2 0]>>stream
h辀b```bd醙b帙赡佬媚?媺颀 ? ? *]
endstream
endobj
startxref
116
%%EOF

I thought the key is to understand which part of changes made by the Acrobat is the key to let CDR recognize the file.


Solution

  • Adobe Acrobat will often accept incorrect PDF programming, then at time of closure offer to totally re-edit the whole PDF program structure.

    The new code will have very little similarity to the original as it will often be "Web Enhanced" (/Linearized).

    What triggers that total rewrite, can be a small errors in syntax or multiple wrong stack code instructions or imbedded variables.

    The Iterative process in most PDF editing readers is to look at the head and tail data, then use those decimal addresses for running the application threads from those file pointers.

    Thus critically the trailer data must be correct, or the editor goes into error functions, that may or may not work with those instructions and data.

    The trailer structure in this simplified case must be "positions perfect" and include ALL the function offset addresses.

    For the example above we should have something like this.

    %PDF-1.4
    %£¬£¬
    1 0 obj
    << /Pages 2 0 R /Type /Catalog >>
    endobj
    2 0 obj
    << /Count 1 /Kids [ 3 0 R ] /Type /Pages >>
    endobj
    3 0 obj
    << /Contents 4 0 R /MediaBox [ 0 0 500 800 ] /Parent 2 0 R /Resources 5 0 R /Type /Page >>
    endobj
    4 0 obj
    << /Length 60 >>
    stream
    BT /F1 24 Tf 175 720 Td <feff004821260065006c006c006f> Tj ET
    endstream
    endobj
    5 0 obj
    << /Font << /F1 6 0 R >> >>
    endobj
    6 0 obj
    << /BaseFont /Courier /Subtype /Type1 /Type /Font >>
    endobj
    xref
    0 7
    0000000000 65536 f 
    0000000015 00000 n 
    0000000064 00000 n 
    0000000123 00000 n 
    0000000229 00000 n 
    0000000339 00000 n 
    0000000382 00000 n 
    trailer
    << /Root 1 0 R /Size 7 /ID [ <89311A609A751F1666063E6962E79BD5><89311A609A751F1666063E6962E79BD5> ] >>
    startxref
    450
    %%EOF
    

    So what are my key differences ?

    1. A common problem is saving a file in UTF-8 will corrupt the ANSI structure. The binary marker I show for example is 4 bytes £¬£¬ but any UTF will change that to 8 bytes or alternatively we see 2 compound characters when there should be 4 simple ANSI bytes. the result is ALL the addresses from that point forwards are likely to error. thus a typical second entry should be 15, 16 or 17 if there is a prior line feed (for why one or 2 bytes see next comment). In this case you correctly use 15.

    2. The trailer is the key index and if written in Linux style ANSI, needs a space at the end of each XREF line or in Windows NO space. So EOL here is either 200A or 0D0A (Mac Encoded =200D)

    3. The pointer values must be 100% correct or the wrong partial code may run. This is where we differ you have length 57 I have length 60 and thus from that point onwards all pointers are suspect.

    4. The number of XREF entries will usually be one more than the number of active or inactive non-duplicated obj whatever their numbering.