I've been floundering and going in circles with this problem for a couple of days now. I'm hoping someone here can help.
I have a PDF document in a filestream that I'd like to use iText7 8.0.3 from C# to find all the instances of the keyword 'and' and highlight them with a red background and then save the document back to a memory stream then to a copy of the pdf.
Here's my code which almost works, it does render the red backgrounds but just in the wrong relative locations: -
using iText.Kernel.Colors;
using iText.Kernel.Pdf;
using iText.Kernel.Pdf.Canvas;
using iText.Kernel.Pdf.Canvas.Parser;
using iText.Kernel.Pdf.Canvas.Parser.Listener;
using System.IO;
FileStream src = new FileStream("C:\\Temp\\34207180.pdf", FileMode.Open);
MemoryStream ms = new MemoryStream();
string keyword = "and";
PdfDocument pdfDoc = new PdfDocument(new PdfReader(src), new PdfWriter(ms));
int pdfPages = pdfDoc.GetNumberOfPages();
for (int page = 1; page <= pdfPages; page++)
{
Regex regex = new Regex(keyword, RegexOptions.IgnoreCase);
RegexBasedLocationExtractionStrategy extractionStrategy = new RegexBasedLocationExtractionStrategy(regex);
PdfCanvasProcessor parser = new PdfCanvasProcessor(extractionStrategy);
parser.ProcessPageContent(pdfDoc.GetPage(page));
List<IPdfTextLocation> locs = extractionStrategy.GetResultantLocations().ToList();
PdfCanvas pdfCanvas = new PdfCanvas(pdfDoc.GetPage(page).NewContentStreamAfter(), pdfDoc.GetPage(page).GetResources(), pdfDoc);
foreach (var l in locs)
{
pdfCanvas
.SaveState()
.SetFillColor(ColorConstants.RED)
.Rectangle(l.GetRectangle().GetX(), l.GetRectangle().GetY(), l.GetRectangle().GetWidth(), l.GetRectangle().GetHeight())
.Fill()
.RestoreState();
}
}
pdfDoc.Close();
byte[] img = ms.ToArray();
File.WriteAllBytes("C:\\Temp\\34207180-dest.pdf", img);
And here are the example input and outputted PDF files, Source Destination
Can anyone explain what's going on? It's like GetResultantLocations is returning values in a different scale to that required of PdfCanvas Rectangle Fill.
I have read many articles on this site and elsewhere to no resolution.
The drawing you do is affected by a transformation matrix set in the original content.
To be unaffected by any active transformation matrix, you could use the Canvas constructor PdfCanvas(PdfPage page, bool wrapOldContent)
. This wrapOldContent will wrap existing content with save state and restore state resulting in a pristine state.
The drawn rectangles will block out the text. That can be fixed by setting the blend mode to multiply. Canvas.SetExtGState(new PdfExtGState().SetBlendMode(PdfExtGState.BM_MULTIPLY))
I have updated part of your code to reflect these and some other changes:
PdfCanvas pdfCanvas = new PdfCanvas(pdfDoc.GetPage(page), true);
pdfCanvas.SaveState();
pdfCanvas.SetFillColor(ColorConstants.RED);
pdfCanvas.SetExtGState(new PdfExtGState().SetBlendMode(PdfExtGState.BM_MULTIPLY));
foreach (var l in locs)
{
pdfCanvas
.Rectangle(l.GetRectangle().GetX(), l.GetRectangle().GetY(), l.GetRectangle().GetWidth(),
l.GetRectangle().GetHeight())
.Fill();
}
pdfCanvas.RestoreState();