Search code examples
xmloracletalendstax

Updating a big XML using Talend with StAX module


I'm trying to create a Talend job, to update a big XML (11 million Lines, 300 Mb).

Was thinking about using StAX in a tJava module which would open the XML then read some data into an Oracle Database and update this data into the XML.

My problem is that I don't really know how to use stax in Talend taking in parameter each row from the database and updating the data this row is referring to into the XML.

How should I organize Talend's modules to develop that? is StAX the good way to go?

The Oracle table contains

IDHOME  OCCADRESS   STREET              POSTALCODE  CITY
123     1           5 rue de la folie   123456      In Da City
123     2           18 rue de la joie   123457      out Da City
123     3           25  av du moulin    65882       EFDFEA
124     1           12 rue de la taver  123456      ZAER
124     2           18 rue de la joie   123456      SFFEF
125     1           46 bd le roi        39817       DIPFL

My xml is like :

<root>
  <Home>
    <homeDatas>
       <idHome>123</idHome>
        ....
        .....
    </HomeDatas>
    <peoples>
        <people>...</people>
        <people>...</people>
        <people>...</people>
        <people>...</people>
    </peoples>
    <Adresses>
        <adress>
             <OccAdress>1</OccAdress>
             <street>12 rue de la Paix</street>
             <postalCode>123456</postalCode>
             <city>Da City</city>

        </adress>
        <adress>
             <OccAdress>2</OccAdress>
             .......
        </adress>
        <adress>...</adress>
    </Adresses>
  </Home>
</root>

I have a first job in Talend that read the XML then extract all the adresses from a home. Then I enrich and normalize the adresses, doesn't matter how, and put it in an oracle database.

I would like to create a job(routine) that for each line of the oracle database get the with the IDHOME then with the OCCADRESS grab the .... and update , , in this with the data from the current line.

The main problem is that I have no idea what I could/should use to "Update" the XML but talend and StAX seemed like a good idea to do it. I was hoping to have some help to create the tJava taking in args all the line to update + original XML file and do the job:

I actually dropped StAX and looking to use this : http://vtd-xml.sourceforge.net/

Thanks for your help :-D


Solution

  • You can use standard Talend components to update this xml file.

    Standard Xml component allows to read very big xml files. On tFileInput xml, open the "Advanced Settings" parameters. Then select "Fast with low memory consumption (SAX)".

    enter image description here

    Then you can design a job like the following in order to merge data between your xml data source and your oracle database.

    enter image description here

    You need to extract all the data with tFileInputXml with a loop on the address node. Then you will get list of adresses joined with their idHome. Check the "getNodes" option to keep and in xml format like in the following screenshot.

    enter image description here

    Then you just have to make a mapping in the tMap with the keys "idHome" and "OccAdress". Then map the output with the result of your join operation

    enter image description here When done, check that you get the expected result as output and connect the tAdvancedOutputXml to write your xml file.