Search code examples
xmlhadoophiveapache-pig

Pig Explode XML from Column in HCat


I have data in a hive table that looks like this:

col1    |    col2
12345       <AnXMLElement><AChildElement1>2</AChildElement1><AChildElement2>4</AChildElement2></AnXMLElement>

I want to take the second column and explode it out into elements like so:

col1    |    col2    |  AChildElement1   |   AChildElement2
12345    <OriginalData>      2                   4

And write it out to a CSV file.

I can use the Hive/Hcat loader to pull the data in and filter it down to these two columns, but I'm confused on how to use XPath or XMLoader to load a column from an already filtered set of results. Is there a way to use XPath or XMLoader to load from a column only instead of a file?

Thanks in advance.


Solution

  • Use as below :-

    select *,xpath_string(z,'//AChildElement1'),xpath_string(z,'//AChildElement2') from table;
    

    For more XMLpath related functions : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+XPathUDF

    or you can also use XMLserde approach.