Search code examples
.net-coreitext7

Why did I get the same attachment twice from a pdf with IText7


I try to add an XML attachment to a PDF file with

String xmlFileDisplayName;
string path = @"E:\test\a4_blank.pdf";
PdfDocument pdfDoc = new PdfDocument(new PdfReader(path), new PdfWriter(@"e:\test\out\embedxml.pdf"));
string xmlpath = @"C:\Users\yalin\Desktop\1.xml";
var bytes = File.ReadAllBytes(xmlpath);
xmlFileDisplayName = "1.xml";
PdfFileSpec fs = PdfFileSpec.CreateEmbeddedFileSpec(pdfDoc, bytes, "xml data file attachment", xmlFileDisplayName, PdfName.ApplicationXml, null, new PdfName("Data"));
pdfDoc.AddFileAttachment(xmlFileDisplayName, fs);
pdfDoc.Close();

When I was using this XML attachment with

string path = @"e:\test\out\embedxml.pdf";
string outPath = @"e:\test\out";
PdfReader reader = new PdfReader(path);
PdfDocument pdfDoc = new PdfDocument(new PdfReader(path));
PdfDictionary root = pdfDoc.GetCatalog().GetPdfObject();
PdfDictionary documentNames = root.GetAsDictionary(PdfName.Names);
if (documentNames != null)
{
    PdfDictionary embeddedFiles = documentNames.GetAsDictionary(PdfName.EmbeddedFiles);
    if (embeddedFiles != null)
    {
        PdfArray filespecs = embeddedFiles.GetAsArray(PdfName.Names);
        for (int i = 0; i < filespecs.Size(); i++)
        {
            i++;
            PdfDictionary filespec = filespecs.GetAsDictionary(i);
            PdfDictionary file = filespec.GetAsDictionary(PdfName.EF);
            foreach (PdfName key in file.KeySet())
            {
                var fos = File.Create(Path.Combine(outPath, filespec.GetAsString(key).ToString()));
                var stream = file.GetAsStream(key);
                fos.Write(stream.GetBytes());
                fos.Flush();
                fos.Close();
            }

        }
    }
}

I got it twice. Is there any problem with adding attachments, or get them. Thanks.


Solution

  • You iterate over all the entries of the EF (embedded file) dictionary in each file specification dictionary:

    PdfDictionary file = filespec.GetAsDictionary(PdfName.EF);
    foreach (PdfName key in file.KeySet())
    {
        var fos = File.Create(Path.Combine(outPath, filespec.GetAsString(key).ToString()));
        var stream = file.GetAsStream(key);
        fos.Write(stream.GetBytes());
        fos.Flush();
        fos.Close();
    }
    

    But there may be multiple entries in the EF all pointing to the same embedded file or different variants of it.

    In case of your document, the EF dictionary contains two entries (for the F and UF keys) both pointing to the same stream with the embedded xml.

    Is there any problem with adding attachments, or get them.

    This structure is correct (actually even recommended) according to the specification. Thus, no problem with your adding of attachments.

    Consequentially you have to interpret the contents of the EF dictionary more cautiously and be prepared to find multiple entries here with identical or non-identical contents.