I need to compare two PDF files for equality. The two files need to be identical in content, and I'm not having any success with the proposals found on:
https://stackoverflow.com/a/36108862/2807741
public static bool AreFileContentsEqual(String path1, String path2) =>
File.ReadAllBytes(path1).SequenceEqual(File.ReadAllBytes(path2));
and
https://stackoverflow.com/a/76917554/2807741
private bool AreFilesEqual(string file1Path, string file2Path)
{
string file1Hash = "", file2Hash = "";
SHA1 sha = new SHA1CryptoServiceProvider();
using (FileStream fs = System.IO.File.OpenRead(file1Path))
{
byte[] hash;
hash = sha.ComputeHash(fs);
file1Hash = Convert.ToBase64String(hash);
}
using (FileStream fs = System.IO.File.OpenRead(file2Path))
{
byte[] hash;
hash = sha.ComputeHash(fs);
file2Hash = Convert.ToBase64String(hash);
}
return (file1Hash == file2Hash);
}
(among other links I've tried).
I'm comparing two "identical" files and they're always returning false (unless I compare a file with itself, only case where it returns true).
The way I created the files to compare is the next:
Maybe something is changing in the second file when saving even I'm not making any modifications to it?
file1.pdf:
file2.pdf
Edit 1:
When I say "Identical" I mean identical in content. The PDFs will contain amounts (numbers), and those amounts in the PDF bills must be exactly the same.
Ok, I'll answer myself. iText7 is the way to go, as it can read PDF files content as text.
Nuget package: https://www.nuget.org/packages/itext7
public IActionResult Index()
{
var exeFilePath = System.Reflection.Assembly.GetExecutingAssembly().Location;
var workPath = $"{Path.GetDirectoryName(exeFilePath)}\\Assets";
var file1 = $"{workPath}\\testpdfv1.pdf";
var file2a = $"{workPath}\\testpdfv2equalv1.pdf";
var file2b = $"{workPath}\\testpdfv2differentv1.pdf";
var fileContents1 = PdfToText(file1);
var fileContents2 = PdfToText(file2a);
var filesAreEqual = fileContents1 == fileContents2;
return View();
}
private string PdfToText(string pPdfFileInfo)
{
var pdfFileInfo = new FileInfo(pPdfFileInfo);
var pdfDocument = new PdfDocument(new PdfReader(pdfFileInfo.FullName));
var strategy = new LocationTextExtractionStrategy();
var result = "";
for (int i = 1; i <= pdfDocument.GetNumberOfPages(); ++i)
{
var page = pdfDocument.GetPage(i);
string text = PdfTextExtractor.GetTextFromPage(page, strategy);
result += text;
}
pdfDocument.Close();
return result;
}