I'm trying to use Apache Drill (for the first time) on a JSON file that looks like this:
{
"Key1": {
"htmltags": "<htmltag attr1='bravo' /><htmltag attr2='delta' /><htmltag attr3='charlie' />"
},
"Key2": {
"htmltags": "<htmltag attr1='kilo' /><htmltag attr2='lima' /><htmltag attr3='mike' />"
},
"Key3": {
"htmltags": "<htmltag attr1='november' /><htmltag attr2='foxtrot' /><htmltag attr3='sierra' />"
}
}
My initial query was the hello world of drill: SELECT * FROM DataFile.json
, and returned me the columns Key1
, Key2
, Key3
. They only had one row, and it contained the entry:
"<htmltag attr1='bravo' /><htmltag attr2='delta' /><htmltag attr3='charlie' />"
[i.e., only the entry Key1.htmltags
].
I have two questions:
Unfortunately, it looks like Drill isn't the right tool (v1.1.0 as of this writing on Homebrew) for the job.
Hence, I'll go with an XML parser, DOM tree crawler or the like, and use a bash string function to extract the target tag strings awk/tee.