Search code examples
xmlxquerytokenize

xQuery XML tokenize a string


I'm new to xQuery and can't seem to get the following to work:

<measInfo measInfoId="1542455297">
  <measTypes>1542455297 1542455298 1542455299 1542455300 1542455301 1542455302 1542455303 1542455304 1542455305 1542455306 1542455307 1542460296 1542460297 </measTypes>
  <measValue measObjLdn="LTHAB0113422/ETHPORT:Cabinet No.=0, Subrack No.=1, Slot No.=7, Port No.=0, Subboard Type=BASE_BOARD">
    <measResults>116967973 585560 496041572 682500 0 12583680 72080 520454 46670568 73432 2205837 1000000 1000000 </measResults>
  </measValue>
  <measValue measObjLdn="LTHAB0113422/ETHPORT:Cabinet No.=0, Subrack No.=1, Slot No.=7, Port No.=1, Subboard Type=BASE_BOARD">
    <measResults>0 0 0 0 0 0 0 0 0 0 0 0 0 </measResults>
  </measValue>
</measInfo>

I'm using //measInfo/measTypes/fn:tokenize(text(),'\s+'). I was hoping would return a record for each space delimited value, however it return the same as //measInfo/measTypes/text()

What am I doing wrong?


Solution

  • In XQuery 3.0 (as implemented by BaseX), this actually does work:

    declare context item := document {
    <measInfo measInfoId="1542455297">
    <measTypes>1542455297 1542455298 1542455299 1542455300 1542455301 1542455302 1542455303 1542455304 1542455305 1542455306 1542455307 1542460296 1542460297 </measTypes>
    <measValue measObjLdn="LTHAB0113422/ETHPORT:Cabinet No.=0, Subrack No.=1, Slot No.=7, Port No.=0, Subboard Type=BASE_BOARD">
        <measResults>116967973 585560 496041572 682500 0 12583680 72080 520454 46670568 73432 2205837 1000000 1000000 </measResults>
    </measValue>
    <measValue measObjLdn="LTHAB0113422/ETHPORT:Cabinet No.=0, Subrack No.=1, Slot No.=7, Port No.=1, Subboard Type=BASE_BOARD">
        <measResults>0 0 0 0 0 0 0 0 0 0 0 0 0 </measResults>
    </measValue>
    </measInfo>
    };
    
    for $item in //measInfo/measTypes/fn:tokenize(text(),'\s+')
    return <item>{$item}</item>
    

    ...returns...

    <item>1542455297</item>
    <item>1542455298</item>
    <item>1542455299</item>
    <item>1542455300</item>
    <item>1542455301</item>
    <item>1542455302</item>
    <item>1542455303</item>
    <item>1542455304</item>
    <item>1542455305</item>
    <item>1542455306</item>
    <item>1542455307</item>
    <item>1542460296</item>
    <item>1542460297</item>
    <item/>
    

    Putting <item> around each result ensures that the rendering of these results makes each item visually distinct -- otherwise, you could have each result rendered into a single line of text, and it wouldn't be obvious to the reader whether they were split into multiple items by fn:tokenize() or not.


    Another way to do this is to inject literal newlines:

    for $item in //measInfo/measTypes/fn:tokenize(text(),'\s+')
    return ($item, "&#10;")