Search code examples
wso2xml-namespaceslarge-filessmookswso2-esb

Split and route large XML with namespace using WSO2 ESB and smooks


I need to process an XML file which has a namespace declaration on its root element and containing +133K sub elements, its size is around 500MB; to achieve this i'm using WSO2 ESB 5 and smooks mediator.

Basically what i'm looking for is to split the input file into little chunks with a predefined structure and send each of them to a queue for later processing.

I tried first to do an XSLT transformation first to remove the namespace from the input file but i got an OutOfMemory error like this:

TID: [-1234] [] [2017-03-02 03:04:43,900] ERROR {org.apache.axis2.transport.base.threads.NativeWorkerPool} -  Uncaught exception {org.apache.axis2.transport.base.threads.NativeWorkerPool}
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.apache.axiom.om.impl.llom.factory.OMLinkedListImplFactory.createOMText(OMLinkedListImplFactory.java:192)
    at org.apache.axiom.om.impl.builder.StAXBuilder.createOMText(StAXBuilder.java:294)
    at org.apache.axiom.om.impl.builder.StAXBuilder.createOMText(StAXBuilder.java:250)
    at org.apache.axiom.om.impl.builder.StAXOMBuilder.next(StAXOMBuilder.java:252)
    at org.apache.axiom.om.impl.llom.OMSerializableImpl.build(OMSerializableImpl.java:78)
    at org.apache.axiom.om.impl.llom.OMElementImpl.build(OMElementImpl.java:722)
    at org.apache.axiom.om.impl.llom.OMElementImpl.detach(OMElementImpl.java:700)
    at org.apache.axiom.om.impl.llom.OMNodeImpl.setParent(OMNodeImpl.java:105)
    at org.apache.axiom.om.impl.llom.OMNodeImpl.insertSiblingAfter(OMNodeImpl.java:203)
    at org.apache.synapse.mediators.transform.XSLTMediator.performXSLT(XSLTMediator.java:366)
    at org.apache.synapse.mediators.transform.XSLTMediator.mediate(XSLTMediator.java:202)
    at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:97)
    at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:59)
    at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:158)
    at org.apache.synapse.core.axis2.ProxyServiceMessageReceiver.receive(ProxyServiceMessageReceiver.java:210)
    at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
    at org.apache.axis2.transport.base.AbstractTransportListener.handleIncomingMessage(AbstractTransportListener.java:328)
    at org.apache.synapse.transport.vfs.VFSTransportListener.processFile(VFSTransportListener.java:824)
    at org.apache.synapse.transport.vfs.VFSTransportListener.scanFileOrDirectory(VFSTransportListener.java:472)
    at org.apache.synapse.transport.vfs.VFSTransportListener.poll(VFSTransportListener.java:188)
    at org.apache.synapse.transport.vfs.VFSTransportListener.poll(VFSTransportListener.java:134)
    at org.apache.axis2.transport.base.AbstractPollingTransportListener$1$1.run(AbstractPollingTransportListener.java:67)
    at org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(NativeWorkerPool.java:172)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

I did not understand why this is happening because my virtual machine is configured to work with -Xms4096m -Xmx6144m

Based on the previous error i decided to implement kind of streaming solution using smooks, then i defined a vfs proxy service to poll a folder and give the file to smook mediator but i keep getting an error that seems to be related to the namespace definition on the root element of the input file and i mention this because whenever i edit the input file and get rid of the namespace definition what i have defined and deployed on WSO2 ESB works perfectly. The point here is i'm receiving the large file from a backend black box system and i should deal with the namespace stuff.

The following are the definitions i have on my ESB:

Proxy Service

<?xml version="1.0" encoding="UTF-8"?>
<proxy xmlns="http://ws.apache.org/ns/synapse"
       name="Tryzens_ProductProxy"
       startOnLoad="true"
       statistics="disable"
       trace="disable"
       transports="vfs">
   <target>
      <inSequence>
         <log level="custom">
            <property name="Tryzens_ProductProxy__tracing" value="before smooks"/>
         </log>
         <property name="DISABLE_SMOOKS_RESULT_PAYLOAD" value="true"/>
         <smooks config-key="ProductSplitJMS_Smook">
            <input type="xml"/>
            <output type="xml"/>
         </smooks>
         <log level="custom">
            <property name="Tryzens_ProductProxy__tracing" value="after smooks"/>
         </log>
      </inSequence>
   </target>
   <parameter name="transport.vfs.Streaming">true</parameter>
   <parameter name="transport.PollInterval">15</parameter>
   <parameter name="transport.vfs.ActionAfterProcess">MOVE</parameter>
   <parameter name="transport.vfs.FileURI">vfs:file:///home/jairof/wso2/00_test/working/tryzens/smook_product/</parameter>
   <parameter name="transport.vfs.MoveAfterProcess">vfs:file:///home/jairof/wso2/00_test/working/tryzens/output/</parameter>
   <parameter name="transport.vfs.MoveAfterFailure">vfs:file:///home/jairof/wso2/00_test/working/tryzens/fails/</parameter>
   <parameter name="transport.vfs.FileNamePattern">.*.xml</parameter>
   <parameter name="transport.vfs.ContentType">application/xml</parameter>
   <parameter name="transport.vfs.ActionAfterFailure">MOVE</parameter>
   <description/>
</proxy>

Smooks configuration

<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd" xmlns:xsl="http://www.milyn.org/xsd/smooks/xsl-1.1.xsd" xmlns:core="http://www.milyn.org/xsd/smooks/smooks-core-1.3.xsd" xmlns:jms="http://www.milyn.org/xsd/smooks/jms-routing-1.2.xsd">
      <params>
         <param name="stream.filter.type">SAX</param>
         <param name="default.serialization.on">false</param>
      </params>
      <resource-config selector="product">
         <resource>org.milyn.delivery.DomModelCreator</resource>
      </resource-config>
      <jms:router routeOnElement="product" beanId="productItem_xml" destination="dynamicQueues/TestFL">
         <jms:connection factory="QueueConnectionFactory"/>
         <jms:jndi contextFactory="org.apache.activemq.jndi.ActiveMQInitialContextFactory" providerUrl="tcp://localhost:61616"/>
         <jms:highWaterMark mark="-1"/>
      </jms:router>
      <ftl:freemarker applyOnElement="product">
         <ftl:template>/repository/resources/smooks/product.ftl</ftl:template>
         <ftl:use>
            <ftl:bindTo id="productItem_xml"/>
         </ftl:use>
      </ftl:freemarker>
</smooks-resource-list>

Smooks template

This template is only for testing purposes, the real one corresponds to the complete structure of the product element, but to reproduce the error situation it is enough:

<#ftl ns_prefixes={"ns1": "http://www.demandware.com/xml/impex/catalog/2006-10-31"}>
<product id='${.vars["product"]["@product-id"]}'>
    <ean>${product.ean}</ean>        
</product>

Sample input file

Note that the actual file has more than 133K products, in this sample I cut most part of the file and left only two products

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://www.demandware.com/xml/impex/catalog/2006-10-31" catalog-id="tml-catalog-en">
    <header>
        <image-settings>
            <internal-location base-path="/images"/>
            <view-types>
                <view-type>original</view-type>
                <view-type>portrait</view-type>
                <view-type>badge_GBP</view-type>
                <view-type>badge_EUR</view-type>
                <view-type>badge_USD</view-type>
                <view-type>badge_AUD</view-type>
                <view-type>badge_CZH</view-type>
                <view-type>ctlimage</view-type>
                <view-type>badge_FRA</view-type>
                <view-type>badge_GER</view-type>
                <view-type>landscape</view-type>
            </view-types>
            <alt-pattern>${productname}, ${variationvalue}, ${viewtype}</alt-pattern>
            <title-pattern>${productname}, ${variationvalue}</title-pattern>
        </image-settings>
    </header>

    <category category-id="MensShoes">
        <display-name xml:lang="de-DE">Schuhe</display-name>
        <display-name xml:lang="x-default">Shoes</display-name>
        <display-name xml:lang="fr-FR">Chaussures</display-name>
        <online-flag>true</online-flag>
        <parent>MENSWEAR</parent>
        <position>12.0</position>
        <image>images/slot/landing/men_menlanding_H1_GBP.jpg</image>
        <template/>
        <page-attributes/>
        <custom-attributes>
            <custom-attribute attribute-id="categoryRecommendationsEnable">false</custom-attribute>
            <custom-attribute attribute-id="enableCompare">false</custom-attribute>
            <custom-attribute attribute-id="enableGridItemButtonStrip">false</custom-attribute>
            <custom-attribute attribute-id="enableGridItemMobileButtonStrip">false</custom-attribute>
            <custom-attribute attribute-id="enableUserJourney">false</custom-attribute>
            <custom-attribute attribute-id="enableWishlist">false</custom-attribute>
            <custom-attribute attribute-id="fitsme_enabled">false</custom-attribute>
            <custom-attribute attribute-id="rrGenere">false</custom-attribute>
            <custom-attribute attribute-id="rsCategoryEnabled">false</custom-attribute>
            <custom-attribute attribute-id="shopAllButton">false</custom-attribute>
            <custom-attribute attribute-id="showInMenu">true</custom-attribute>
            <custom-attribute attribute-id="showInMobileMenu">false</custom-attribute>
            <custom-attribute attribute-id="show_alternate_image_on_plp">false</custom-attribute>
            <custom-attribute attribute-id="slotBannerImage">images/slot/landing/men_menlanding_H1_GBP.jpg</custom-attribute>
        </custom-attributes>
    </category>

    <category category-id="P50 SUIT">
        <display-name xml:lang="de-DE">Hosen</display-name>
        <display-name xml:lang="x-default">Trousers</display-name>
        <display-name xml:lang="fr-FR">Pantalons</display-name>
        <online-flag>true</online-flag>
        <parent>WomensTailoring</parent>
        <position>0.0</position>
        <template/>
        <page-attributes/>
    </category>

    <product product-id="0">
        <ean/>
        <upc/>
        <unit/>
        <min-order-quantity>1</min-order-quantity>
        <step-quantity>1</step-quantity>
        <store-force-price-flag>false</store-force-price-flag>
        <store-non-inventory-flag>false</store-non-inventory-flag>
        <store-non-revenue-flag>false</store-non-revenue-flag>
        <store-non-discountable-flag>false</store-non-discountable-flag>
        <online-flag>false</online-flag>
        <available-flag>true</available-flag>
        <searchable-flag>true</searchable-flag>
        <images>
            <image-group view-type="badge_EUR">
                <image path="badge/blank.png"/>
            </image-group>
            <image-group view-type="badge_GBP">
                <image path="badge/blank.png"/>
            </image-group>
            <image-group view-type="badge_GER">
                <image path="badge/blank.png"/>
            </image-group>
            <image-group view-type="badge_USD">
                <image path="badge/blank.png"/>
            </image-group>
        </images>
        <page-attributes/>
        <pinterest-enabled-flag>false</pinterest-enabled-flag>
        <facebook-enabled-flag>false</facebook-enabled-flag>
        <store-attributes>
            <force-price-flag>false</force-price-flag>
            <non-inventory-flag>false</non-inventory-flag>
            <non-revenue-flag>false</non-revenue-flag>
            <non-discountable-flag>false</non-discountable-flag>
        </store-attributes>
    </product>

    <product product-id="12024">
        <ean/>
        <upc/>
        <unit/>
        <min-order-quantity>1</min-order-quantity>
        <step-quantity>1</step-quantity>
        <store-force-price-flag>false</store-force-price-flag>
        <store-non-inventory-flag>false</store-non-inventory-flag>
        <store-non-revenue-flag>false</store-non-revenue-flag>
        <store-non-discountable-flag>false</store-non-discountable-flag>
        <online-flag>false</online-flag>
        <available-flag>true</available-flag>
        <searchable-flag>true</searchable-flag>
        <images>
            <image-group view-type="original">
                <image path="original/12024_original_original.jpg"/>
            </image-group>
        </images>
        <brand>J FRANCOMB</brand>
        <page-attributes/>
        <custom-attributes>
            <custom-attribute attribute-id="allocGroup">X</custom-attribute>
            <custom-attribute attribute-id="colour">
                <value>3PNK-PINK</value>
            </custom-attribute>
            <custom-attribute attribute-id="cuffType">
                <value>SINGLE CUFF</value>
            </custom-attribute>
            <custom-attribute attribute-id="enable_pdp_asset_footer_layout">false</custom-attribute>
            <custom-attribute attribute-id="fabric">
                <value>LEWIN 100 PD</value>
            </custom-attribute>
            <custom-attribute attribute-id="fit">SEMI FIT</custom-attribute>
            <custom-attribute attribute-id="gender">
                <value>M</value>
            </custom-attribute>
            <custom-attribute attribute-id="look">PTRN447</custom-attribute>
            <custom-attribute attribute-id="pattern">
                <value>PATTERN</value>
            </custom-attribute>
            <custom-attribute attribute-id="productIDCIMS">12024</custom-attribute>
            <custom-attribute attribute-id="retailTypeCIMS">M FORMAL</custom-attribute>
            <custom-attribute attribute-id="seasonCIMS">307B</custom-attribute>
            <custom-attribute attribute-id="styleName">MILSC PATTERN DOOM AND BLOOM</custom-attribute>
            <custom-attribute attribute-id="styleNameCIMS">MILSC PATTERN DOOM AND BLOOM</custom-attribute>
            <custom-attribute attribute-id="styleNumberCIMS">MS17</custom-attribute>
            <custom-attribute attribute-id="typeDesc">MS SHIRTS</custom-attribute>
            <custom-attribute attribute-id="weight">0.3</custom-attribute>
        </custom-attributes>
        <options>
            <shared-option option-id="sleeveLengthAlteration"/>
            <shared-option option-id="giftBox"/>
        </options>
        <variations>
            <attributes>
                <shared-variation-attribute attribute-id="collarSize" variation-attribute-id="collarSize"/>
                <shared-variation-attribute attribute-id="sleeveLength" variation-attribute-id="sleeveLength"/>
            </attributes>
        </variations>
        <classification-category>S17 MILAN</classification-category>
        <pinterest-enabled-flag>false</pinterest-enabled-flag>
        <facebook-enabled-flag>false</facebook-enabled-flag>
        <store-attributes>
            <force-price-flag>false</force-price-flag>
            <non-inventory-flag>false</non-inventory-flag>
            <non-revenue-flag>false</non-revenue-flag>
            <non-discountable-flag>false</non-discountable-flag>
        </store-attributes>
    </product>

    <category-assignment category-id="T43 HERITAGE" product-id="505158991125">
        <primary-flag>true</primary-flag>
    </category-assignment>
    <category-assignment category-id="U30 BOXERS" product-id="505158774834"/>
    <recommendation source-id="58462" source-type="product" target-id="505158886294" type="4"/>
</catalog>

Error in wso2carbon.log file

TID: [-1234] [] [2017-03-02 12:15:27,793]  INFO {org.apache.synapse.mediators.builtin.LogMediator} -  Tryzens_ProductProxy__tracing = before smooks {org.apache.synapse.mediators.builtin.LogMediator}
TID: [-1234] [] [2017-03-02 12:15:28,376] ERROR {freemarker.runtime} -   {freemarker.runtime}

Error on line 3, column 12 in repository/resources/smooks/product.ftl
Expecting a string, date or number here, Expression product.ean is instead a freemarker.ext.dom.NodeListModel
The problematic instruction:
----------
==> ${product.ean} [on line 3, column 10 in repository/resources/smooks/product.ftl]
----------

Java backtrace for programmers:
----------
freemarker.core.NonStringException: Error on line 3, column 12 in repository/resources/smooks/product.ftl
Expecting a string, date or number here, Expression product.ean is instead a freemarker.ext.dom.NodeListModel
    at freemarker.core.Expression.getStringValue(Expression.java:126)
    at freemarker.core.Expression.getStringValue(Expression.java:93)
    at freemarker.core.DollarVariable.accept(DollarVariable.java:76)
    at freemarker.core.Environment.visit(Environment.java:209)
    at freemarker.core.MixedContent.accept(MixedContent.java:92)
    at freemarker.core.Environment.visit(Environment.java:209)
    at freemarker.core.Environment.process(Environment.java:189)
    at freemarker.template.Template.process(Template.java:237)
    at org.milyn.templating.freemarker.FreeMarkerTemplateProcessor.applyTemplate(FreeMarkerTemplateProcessor.java:358)
    at org.milyn.templating.freemarker.FreeMarkerTemplateProcessor.applyTemplate(FreeMarkerTemplateProcessor.java:346)
    at org.milyn.templating.freemarker.FreeMarkerTemplateProcessor.visitAfter(FreeMarkerTemplateProcessor.java:333)
    at org.milyn.delivery.sax.SAXHandler.visitAfter(SAXHandler.java:389)
    at org.milyn.delivery.sax.SAXHandler.endElement(SAXHandler.java:204)
    at org.milyn.delivery.SmooksContentHandler.endElement(SmooksContentHandler.java:96)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at org.milyn.delivery.sax.SAXParser.parse(SAXParser.java:76)
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:86)
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:64)
    at org.milyn.Smooks._filter(Smooks.java:526)
    at org.milyn.Smooks.filterSource(Smooks.java:482)
    at org.wso2.carbon.mediator.transform.SmooksMediator.mediate(SmooksMediator.java:146)
    at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:97)
    at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:59)
    at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:158)
    at org.apache.synapse.core.axis2.ProxyServiceMessageReceiver.receive(ProxyServiceMessageReceiver.java:210)
    at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
    at org.apache.axis2.transport.base.AbstractTransportListener.handleIncomingMessage(AbstractTransportListener.java:328)
    at org.apache.synapse.transport.vfs.VFSTransportListener.processFile(VFSTransportListener.java:824)
    at org.apache.synapse.transport.vfs.VFSTransportListener.scanFileOrDirectory(VFSTransportListener.java:472)
    at org.apache.synapse.transport.vfs.VFSTransportListener.poll(VFSTransportListener.java:188)
    at org.apache.synapse.transport.vfs.VFSTransportListener.poll(VFSTransportListener.java:134)
    at org.apache.axis2.transport.base.AbstractPollingTransportListener$1$1.run(AbstractPollingTransportListener.java:67)
    at org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(NativeWorkerPool.java:172)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Please help, i would appreciate any comments to solve this issue Thanks in advance


Solution

  • In the smooks template (.ftl file), if you want to use something like ${product.ean}, you must define "product" variable :

    <#assign product = .vars["product"]>
    

    In your xml input file, all nodes belongs to the same defaut namespace "http://www.demandware.com/xml/impex/catalog/2006-10-31"

    You can define this default namespace in FTL with the reserved prefixe "D" : <#ftl ns_prefixes={"D":"http://www.demandware.com/xml/impex/catalog/2006-10-31"}>