Search code examples
itextbiztalkbiztalk-2013

BizTalk custom pipeline parsing POP3 PDF attachment error


I have a BizTalk custom pipeline component where I'm parsing a PDF attachment using itexsharp into a custom model. The pipeline is bound to a POP3 receiving port.

In the new created message if I return the attachment stream (outputMessage.GetPart("Body").Data = ms), then this is looking good in the BizTalk administration console. I have been able to save the message from here manually and this was parsed correctly using the same parsing method as in the pipeline.

When parsing the PDF directly in the pipeline, then I'm getting the following error: Rebuild failed: trailer not found.; Original message: xref subsection not found at file pointer 1620729

If I remove the default XMLDisassembler component from pipeline, then the parsing error disappeared, but in the console the message Body is empty, although the AttachmentSizeInBytes=1788

public IBaseMessage Execute(IPipelineContext pContext, IBaseMessage pInMsg)
{
    return ExtractMessagePartToMessage(pContext, pInMsg);
}

private IBaseMessage ExtractMessagePartToMessage(IPipelineContext pContext, IBaseMessage pInMsg)
        {
            if (pInMsg.PartCount <= 1)
            {
                throw new InvalidOperationException("The email had no attachment, apparently.");
            }

            string partName; 
            IBaseMessagePart attachmentPart = pInMsg.GetPartByIndex(1, out partName);
            Stream attachmentPartStream = attachmentPart.GetOriginalDataStream();

            IBaseMessage outputMessage;
            outputMessage = pContext.GetMessageFactory().CreateMessage();
            outputMessage.AddPart("Body", pContext.GetMessageFactory().CreateMessagePart(), true);
            outputMessage.Context = pInMsg.Context;

            var ms = new MemoryStream();
            attachmentPartStream.CopyTo(ms);
            ms.Seek(0L, SeekOrigin.Begin);

            Stream orderStream = PdfFormParser.Parse(ms);

            outputMessage.GetPart("Body").Data = orderStream;
            outputMessage.Context.Write("AttachmentName", "http://schemas.microsoft.com/BizTalk/2003/file-properties", partName);
            outputMessage.Context.Write("AttachmentSizeInBytes", "http://schemas.microsoft.com/BizTalk/2003/file-properties", orderStream.Length.ToString());


            pContext.ResourceTracker.AddResource(ms);
            pContext.ResourceTracker.AddResource(orderStream);

            return outputMessage;
        }

  public static Stream Parse(Stream pdfDocument)
        {
            using (var reader = new PdfReader(pdfDocument))
            {
                var outputStream = new MemoryStream();
                var pdfForm = ParseInternal(reader);
                var xmlDocument = new XmlDocument();
                xmlDocument.LoadXml(pdfForm.Serialize());

                xmlDocument.Save(outputStream);

                return outputStream;
            }

Solution

  • In pipelines when you read or write a Stream, you have to rewind the stream back to the beginning if something else is going to use it (especially the final message that you expect BizTalk to process).