I am writing an C# WPF application where I insert a 'header' page as the first page of a batch of PDF documents. The header page is taken from the first page of the first pdf in the batch.
The user will instigate this process, but I want to make sure that at a later date the user cannot run this process again which would result in another header being inserted.
So my plan is to get the SHA256 hash of the header page and compare it with the hashes of the first page of the other pdfs. If they match, then the first page is the same as the header page, if not we insert the header.
I knocked up the code below to test getting the hash of the first page in a pdf, but the hash is different every time it is run.
Why is it different every time?
Thanks
using System.IO;
using System.Text;
using System.Security.Cryptography;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
namespace Syncada
{
public class PDFDoc
{
private PdfDocument pdfDoc;
public PDFDoc(string path)
{
pdfDoc = PdfReader.Open(path,PdfDocumentOpenMode.Import);
}
public string GetPageOneHash()
{
byte[] hash;
PdfPage page = pdfDoc.Pages[0];
using (MemoryStream stream = new MemoryStream())
{
PdfDocument doc = new PdfDocument();
doc.AddPage(page);
doc.Save(stream,false);
SHA256 sha256 = SHA256.Create();
hash = sha256.ComputeHash(stream);
}
StringBuilder sb = new StringBuilder();
for (int i = 0; i < hash.Length; i++)
{
sb.Append(hash[i].ToString("X2"));
}
return sb.ToString();
}
}
}
I knocked up the code below to test getting the hash of the first page in a pdf, but the hash is different every time it is run.
Why is it different every time?
You do not calculate the hash of the page but the hash of a new PDF document to which you add the page in question. Unfortunately for your endeavor, PDF documents contain information like the creation date, the last modification date, and a unique ID. As these information pieces differ each time you calculate a hash, you'll never get the same hash (unless you have a collision).