I'm trying to generate a PDF file programmatically.
The entire case is: I'm receiving a multiple page PDFS. Each page is an image, with the contents i want. I don't want to use external libraries because i'm looking for performance \ optimization (in the long run it will matter to me). I used to have something already working (i created a system like header\file content(image)\footer), and it always worked. However, something has changed and it stopped working.
Anyway, in order to fix it and build from scratch, here are the steps i executed:
I don't know what else to do since everything seems to be almost exatly. I've also decoded some string FlateDecode portion inside the PDF file but i couldn't find anything related to object positioning inside the file.
Here's the code i'm using:
using (var b = new BinaryWriter(File.Open(@"C:\test\Rio\Reboot\fullmanual01.pdf", FileMode.Create)))
{
var imgBytes = File.ReadAllBytes(@"C:\test\Rio\Reboot\decompressedimg.raw");
var firstFlate = File.ReadAllBytes(@"C:\test\Rio\Reboot\flateStr01.raw");
var FlateDecompressed = Encoding.ASCII.GetString(FlateDecompress(firstFlate));
string crlf = Environment.NewLine;
var pdfHeader = Encoding.ASCII.GetBytes($"%PDF-1.4{crlf}");
b.Write(pdfHeader);
pdfHeader = StringToByteArray("25E2E3CFD30D0A");
b.Write(pdfHeader);
var pdfObj = new PDFStrObject(1, $"/Type /Page{crlf}/MediaBox [ 0 0 595 769 ]{crlf}/Resources << /XObject << /X0 3 0 R >> >>{crlf}/Contents 4 0{crlf}/Parent 2 0 R{crlf}/Rotate 360{crlf}>>{crlf}endobj{crlf}").byteFromStrObj;
b.Write(pdfObj);
var secondObjPos = b.BaseStream.Position.ToString("0000000000");
pdfObj = new PDFStrObject(3, $"/Type /XObject{crlf}/Subtype /Image{crlf}/Width 1016{crlf}/Height 1328{crlf}/BitsPerComponent 8{crlf}/ColorSpace /DeviceGray{crlf}/Filter /FlateDecode{crlf}/Length {imgBytes.Length}{crlf}>>{crlf}stream{crlf}").byteFromStrObj;
b.Write(pdfObj);
b.Write(imgBytes);
b.Write(Encoding.ASCII.GetBytes($"{crlf}endstream{crlf}endobj{crlf}"));
var thirdObjPos = b.BaseStream.Position.ToString("0000000000");
pdfObj = new PDFStrObject(4, $"/Filter /FlateDecode{crlf}/Length 45{crlf}>>{crlf}stream{crlf}").byteFromStrObj;
b.Write(pdfObj);
b.Write(firstFlate);
b.Write(Encoding.ASCII.GetBytes($"{crlf}endstream{crlf}endobj{crlf}"));
var secondPos = b.BaseStream.Position;
pdfObj = new PDFStrObject(2, $"/Type /Pages{crlf}/Kids [ 1 0 R ]{crlf}/Count 1{crlf}>>{crlf}endobj{crlf}").byteFromStrObj;
b.Write(pdfObj);
var firstObjPos = b.BaseStream.Position.ToString("0000000000"); //2 0 obj
pdfObj = new PDFStrObject(5, $"/Type /Catalog{crlf}/Pages 2 0{crlf}>>{crlf}endobj{crlf}").byteFromStrObj;
b.Write(pdfObj);
var fourthObhPos = b.BaseStream.Position.ToString("0000000000");
b.Write(Encoding.ASCII.GetBytes($"xref{crlf}0 6{crlf}"));
b.Write(Encoding.ASCII.GetBytes($"0000000000 65535 f{crlf}0000000017 00000 n{crlf}"));
b.Write(Encoding.ASCII.GetBytes($"{firstObjPos} 00000 n{crlf}"));
b.Write(Encoding.ASCII.GetBytes($"{secondObjPos} 00000 n{crlf}"));
b.Write(Encoding.ASCII.GetBytes($"{thirdObjPos} 00000 n{crlf}"));
b.Write(Encoding.ASCII.GetBytes($"{fourthObhPos} 00000 n{crlf}"));
b.Write(Encoding.ASCII.GetBytes($"trailer{crlf}<<{crlf}/Size 6{crlf}/Root 5 0{crlf}/ID [<05bebfaf5c6382cfbc44cd1b3389e097><05bebfaf5c6382cfbc44cd1b3389e097>]{crlf}>>{crlf}startxref{crlf}{b.BaseStream.Position+7}{crlf}%%EOF{crlf}"));
}
and the class for building objects:
class PDFStrObject
{
public string strObj { get; private set; }
public byte[] byteFromStrObj { get; private set; }
public PDFStrObject(int objNum, string content)
{
string crlf = Environment.NewLine;
strObj = $"{objNum} 0 obj{crlf}<<{crlf}{content}";
byteFromStrObj = Encoding.ASCII.GetBytes(strObj);
}
}
The files i've been using are here: https://drive.google.com/drive/folders/11HN9cB9Cs7uqBQdpZkNyNKt29sl_xJrL?usp=sharing
The description is:
decompressedimg-convertido.pdf -> The file i converted online.
decompressedimg.raw -> The image portion i extracted from the multi-page PDF. Dimensions are W: 1016, H: 1328
fullmanual01.pdf -> The file i generated using my code.
PDfRjMultiplePages -> The PDF file with multiple pages i'm willing to programatically extract pages from.
Any input is appreciated. I've also reffered to the question: Issue writing a PDF file from scratch but couldn't find a hint for what i'm trying to do (unfortunately)
Tanks
This first thing that stands out is your startxref
is pointing to the wrong spot.
It points to the red, but should point to the blue spot.
The other obvious issue, is that you have an earlier xref table, in the middle of the file. So either you attempted (perhaps inadvertently) to create either a Linearized or Incremental PDF file. Based on your description there is no point in doing either of those. You should just stick to basic PDF, one xref table at the end of the file.
You should take a closer look at that post you referenced, it seems like a good starting point.
The PDF 1.7 spec also provides very simple, hello work, examples.
There may very well be other issues. You may want to reconsider using a 3rd party library to create your PDF files.