I have data in a hive table that looks like this:
col1 | col2
12345 <AnXMLElement><AChildElement1>2</AChildElement1><AChildElement2>4</AChildElement2></AnXMLElement>
I want to take the second column and explode it out into elements like so:
col1 | col2 | AChildElement1 | AChildElement2
12345 <OriginalData> 2 4
And write it out to a CSV file.
I can use the Hive/Hcat loader to pull the data in and filter it down to these two columns, but I'm confused on how to use XPath or XMLoader to load a column from an already filtered set of results. Is there a way to use XPath or XMLoader to load from a column only instead of a file?
Thanks in advance.
Use as below :-
select *,xpath_string(z,'//AChildElement1'),xpath_string(z,'//AChildElement2') from table;
For more XMLpath related functions : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+XPathUDF
or you can also use XMLserde approach.