Search code examples
c#biztalkbiztalk-2010flat-filecustom-pipeline-component

Split Flat Files into smaller files (on row count) using Custom Pipeline


I am trying to create a custom pipeline component for BizTalk 2010 that splits an incoming flat file into multiple smaller files. I want to split the file (lets say of ~30 000 rows) into files of about 5000 rows each or a bit less (lets say if the file includes 33 000 rows).

I have tried using Selvan's great example of a custom dissassembly pipeline to no avail.

I have used the Pipeline Component Wizard to generate a pipeline skeleton, but would be very happy with any tips or pointers on how to proceed to code the dissassemble stage and split the large file. I am pretty much a newbie at this type of coding.

Any help?


Solution

  • Splitting messages can only be done with a disassembler component. You can create a class that inherits from an existing disassembler (like what Selvin did) or you can specify that you want to create a "DisassemblingParser" component type for receive pipeline type in the Pipeline Component Wizard. Inheriting is useful if you can reuse the design time properties but not necessary.

    When it's run, BizTalk passes the message in via the "Disassemble" method. After this method returns BizTalk starts polling the "GetNext" method until it returns null to get all the output messages. So what you need to design is how you are going to prepare the message in the "Disassemble" method so that you can return the required split messages when BizTalk calls "GetNext".

    Selvan's approach is:

    • In "Disassemble" convert the entire flat file to XML with BizTalk's flat file disassembler (base.Disassemble) and let the base class store the XML output
    • The first time BizTalk calls "GetNext" the unsplit XML message is retrieved from the base class (base.GetNext) and loaded into a XPathDocument and split based on node counts. A new message is created for each part and saved in a collection.
    • Each call to "GetNext" returns one of the messages from the messages collection until they've all been returned so the method returns 'null'.

    As he notes, using XPathNavigator is not good for very large messages. It's always best to use XmlReader when you can so that the message can be processed as a stream without being loaded fully into memory. This can be done by redesigning the GetNext process as:

    • First time it's called create the XmlReader for the disassembled XML message stream.
    • For each call to GetNext, use XmlReader to read forward the required number of nodes while writing to a new output stream that is returned with a new BizTalk message.
    • When you've reached the end of the XML message you can close the reader and return null.

    From your description it sounds like you might want to output flat files without them being disassembled into XML, in which case I would suggest just saving the input stream when Disassemble is called and then using the same GetNext design but with a StreamReader instead of an XmlReader.