I'm trying to create a Talend job, to update a big XML (11 million Lines, 300 Mb).
Was thinking about using StAX in a tJava module which would open the XML then read some data into an Oracle Database and update this data into the XML.
My problem is that I don't really know how to use stax in Talend taking in parameter each row from the database and updating the data this row is referring to into the XML.
How should I organize Talend's modules to develop that? is StAX the good way to go?
The Oracle table contains
IDHOME OCCADRESS STREET POSTALCODE CITY
123 1 5 rue de la folie 123456 In Da City
123 2 18 rue de la joie 123457 out Da City
123 3 25 av du moulin 65882 EFDFEA
124 1 12 rue de la taver 123456 ZAER
124 2 18 rue de la joie 123456 SFFEF
125 1 46 bd le roi 39817 DIPFL
My xml is like :
<root>
<Home>
<homeDatas>
<idHome>123</idHome>
....
.....
</HomeDatas>
<peoples>
<people>...</people>
<people>...</people>
<people>...</people>
<people>...</people>
</peoples>
<Adresses>
<adress>
<OccAdress>1</OccAdress>
<street>12 rue de la Paix</street>
<postalCode>123456</postalCode>
<city>Da City</city>
</adress>
<adress>
<OccAdress>2</OccAdress>
.......
</adress>
<adress>...</adress>
</Adresses>
</Home>
</root>
I have a first job in Talend that read the XML then extract all the adresses from a home. Then I enrich and normalize the adresses, doesn't matter how, and put it in an oracle database.
I would like to create a job(routine) that for each line of the oracle database get the with the IDHOME then with the OCCADRESS grab the .... and update , , in this with the data from the current line.
The main problem is that I have no idea what I could/should use to "Update" the XML but talend and StAX seemed like a good idea to do it. I was hoping to have some help to create the tJava taking in args all the line to update + original XML file and do the job:
I actually dropped StAX and looking to use this : http://vtd-xml.sourceforge.net/
Thanks for your help :-D
You can use standard Talend components to update this xml file.
Standard Xml component allows to read very big xml files. On tFileInput xml, open the "Advanced Settings" parameters. Then select "Fast with low memory consumption (SAX)".
Then you can design a job like the following in order to merge data between your xml data source and your oracle database.
You need to extract all the data with tFileInputXml with a loop on the address node. Then you will get list of adresses joined with their idHome. Check the "getNodes" option to keep and in xml format like in the following screenshot.
Then you just have to make a mapping in the tMap with the keys "idHome" and "OccAdress". Then map the output with the result of your join operation
When done, check that you get the expected result as output and connect the tAdvancedOutputXml to write your xml file.