Search code examples
xmlpentahopentaho-spoonpentaho-data-integration

How can i extract data from the XML using Pentaho when xml tag is repeating?


I am extracting data from the XML . In that i have two duplicate tag with different values . So how can i get this data in different columns ?

<table>
  <tr>
    <td>A</td>
    <td>B</td>
  </tr>
  <tr>
    <td>A1</td>
    <td>B2</td>
  </tr>
</table>

So , i want to get those values in different columns . How can i achieve this ? Any help would be appreciated.


Solution

  • The difficult part is to make kettle to understand in which column to put the result.

    1. In the Content panel, define the Loop XPath as being "/table/tr". That will make the PDI to produce on row per tag.
    2. In the Field panel, define a first column named "col1" with XPath as "td[1]", and a second column named "col2" with XPath "td[2]".

    If the column number is dynamic, you need metadata injection.

    For your info, the repeat checkbox is not to repeat a field, it is to instruct for Kettle to pick the value of the previous row in case a field is missing on a row.

    enter image description here