I am trying to convert a JDF file to a PDF file using C#.
After looking at the JDF format... I can see that the file is simply an XML placed at the top of a PDF document.
I've tried using the StreamWriter / StreamReader
functionality in C# but due to the PDF document also containing binary data, and variable newlines (\r\t and \t) the file produced cannot be opened as some of the binary data is distroyed on the PDF's. Here is some of the code I've tried using without success.
using (StreamReader reader = new StreamReader(_jdf.FullName, Encoding.Default))
{
using (StreamWriter writer = new StreamWriter(_pdf.FullName, false, Encoding.Default))
{
writer.NewLine = "\n"; //Tried without this and with \r\n
bool IsStartOfPDF = false;
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (line.IndexOf("%PDF-") != -1)
{
IsStartOfPDF = true;
}
if (!IsStartOfPDF)
{
continue;
}
writer.WriteLine(line);
}
}
}
I am self answering this question, as it may be a somewhat common problem, and the solution could be informative to others.
As the document contains both binary and text, we cannot simply use the StreamWriter
to write the binary back to another file. Even when you use the StreamWriter
to read a file then write all the contents into another file you will realize differences between the documents.
You can utilize the BinaryWriter
in order to search a multi-part document and write each byte exactly as you found it into another document.
//Using a Binary Reader/Writer as the PDF is multitype
using (var reader = new BinaryReader(File.Open(_file.FullName, FileMode.Open)))
{
using (var writer = new BinaryWriter(File.Open(tempFileName.FullName, FileMode.CreateNew)))
{
//We are searching for the start of the PDF
bool searchingForstartOfPDF = true;
var startOfPDF = "%PDF-".ToCharArray();
//While we haven't reached the end of the stream
while (reader.BaseStream.Position != reader.BaseStream.Length)
{
//If we are still searching for the start of the PDF
if (searchingForstartOfPDF)
{
//Read the current Char
var str = reader.ReadChar();
//If it matches the start of the PDF signiture
if (str.Equals(startOfPDF[0]))
{
//Check the next few characters to see if they match
//keeping an eye on our current position in the stream incase something goes wrong
var currBasePos = reader.BaseStream.Position;
for (var i = 1; i < startOfPDF.Length; i++)
{
//If we found a char that isn't in the PDF signiture, then resume the while loop
//to start searching again from the next position
if (!reader.ReadChar().Equals(startOfPDF[i]))
{
reader.BaseStream.Position = currBasePos;
break;
}
//If we've reached the end of the PDF signiture then we've found a match
if (i == startOfPDF.Length - 1)
{
//Success
//Set the Position to the start of the PDF signiture
searchingForstartOfPDF = false;
reader.BaseStream.Position -= startOfPDF.Length;
//We are no longer searching for the PDF Signiture so
//the remaining bytes in the file will be directly wrote
//using the stream writer
}
}
}
}
else
{
//We are writing the binary now
writer.Write(reader.ReadByte());
}
}
}
}
This code example uses the BinaryReader
to read each char 1 by 1 and if it finds a match of the string %PDF-
(The PDF Start Signature) it will move the reader position back to the %
and then write the remaining document using writer.Write(reader.ReadByte())
.