Search code examples
xmlxpathpentahokettlespoon

How to loop through elements in XML to shred and load to database


I have a requirement where an incoming XML has to be shredded and loaded into database. All elements have their respective tables. Incoming XML looks something like this:

<root>
  <creditreport>
    <data1>
      <A>val1</A>
      <B>val2</B>
    </data1>
    <data2>
      <C>val3</C>
      <D>val4</D>
    </data2>
    <data3>
     <E>val5</E>
     <F>val6</F>
    </data3>
    <data3>
     <G>val7</G>
     <H>val8</H>
    </data3>
  </creditreport>
</root>

Now In Kettle I am designing a generic framework that would fetch the XML and shred it into database. I am using 'Get XML Data' Component to read XML. I have defined 'loop Xpath' as root/creditreport and then I have read fields one-by-one something as:

name   xpath   Element  ResultType 
 A      data1   Node     Valueof 
 B      data1   Node     Valueof 
 ..... 
 .....
 .....
 E     data3    Node     Valueof
 .....
 ..... 
 G     data3    Node     Valueof

But the problem is, it's shredding only first row for and missing the second one. I can understand the reason as XPATH loop is only till . If I define 'xpath loop' as 'root/creditreport/data3' then problem for element 'data3' gets resolved but there are other elements too which can repeat and then I would be standing again at the starting point of my problem.

Any advice!!


Solution

  • If both parent (dataX) and child nodes (A,B,C,etc) are really unique/sequential you can make one very generic setup:

    Use /root/creditreport/*/* as the xpath loop path

    Set up the fields like this manually:

    Field definitions in Get Data from XML

    That should get you an output like this:

    Output sample

    From here you can denormalize or otherwise process your data as you wish. Note that I've added a sequence number for the nodes at the parent level, so you can distinguish the first Data3 from the second, etc.

    If on the other hand your dataX nodes all have the same child nodes (A,B A,B rather than A,B C,D etc), you can use /root/creditreport/* as the xpath loop path and skip the parent node field, configuring your value node A and B normally.

    Here are the definitions and output. All fields are defined relative to the current node (.).

    enter image description here

    enter image description here