Search code examples
xmlxquerytokenize

Breaking up a sequence of id numbers using tokenize


I have an XML file that looks like this:

<cosmetics>
<cosmetic id="0" itemIDs="1879053261,1879053932,1879054863"/>
<cosmetic id="1" itemIDs="1879176339"/>
<cosmetic id="2" itemIDs="1879115954"/>
<cosmetic id="3" itemIDs="1879051065,1879057689"/>
</cosmetics>

What I want is a file that looks like this:

<cosmetics>
<cosmetic id="0">
    <item id="1879053261"/>
    <item id="1879053932"/>
    <item id="1879054863"/>
</cosmetic>
<cosmetic id="1">
    <item id="1879176339"/>
</cosmetic>
etc

I feel like I am close to a solution in XQuery using the tokenize function, something like this:

for $newcosmetic in doc("cosmetics.xml")/cosmetics/cosmetic/
return <cosmetic id="{$newcosmetic/@id}">
    for $itemid in $newcosmetic/tokenize(@itemIDs,",") 
    return <itemid id="{$itemid}"/>
</cosmetic>

But I'm definitely screwing it up somehow. That code above probably only demonstrates how new I am to XQuery. I've searched all through stack overflow and elsewhere, and I am having trouble finding an example that I can directly link to my own problem. Any help would be appreciated.


Solution

  • Looks like you just need to:

    • remove the trailing / from doc("cosmetics.xml")/cosmetics/cosmetic/
    • wrap the inner for in {}

    Example:

    for $newcosmetic in doc("cosmetics.xml")/cosmetics/cosmetic
    return <cosmetic id="{$newcosmetic/@id}">{
        for $itemid in $newcosmetic/tokenize(@itemIDs,",") 
        return <itemid id="{$itemid}"/>
    }</cosmetic>
    

    And if you want to output cosmetics, it's just wrapping everything in <cosmetics>{}</cosmetics>:

    <cosmetics>{
    for $newcosmetic in doc("cosmetics.xml")/cosmetics/cosmetic
    return <cosmetic id="{$newcosmetic/@id}">{
        for $itemid in $newcosmetic/tokenize(@itemIDs,",") 
        return <itemid id="{$itemid}"/>
    }</cosmetic>
    }</cosmetics>
    

    Also something to note since you're new, instead of doing:

    id="{$newcosmetic/@id}"
    

    you can just include $newcosmetic/@id in the inner {} (separated by a comma)...

    <cosmetics>{
    for $newcosmetic in doc("cosmetics.xml")/cosmetics/cosmetic
    return <cosmetic>{$newcosmetic/@id,
        for $itemid in $newcosmetic/tokenize(@itemIDs,",") 
        return <itemid id="{$itemid}"/>
    }</cosmetic>
    }</cosmetics>
    

    Last note...

    The second arg of tokenize() is a regular expression, so if it's possible to have spaces on either side of the comma, consider something like:

    $newcosmetic/tokenize(@itemIDs,"\s*,\s*")
    

    The other option is to normalize the space on output:

    <itemid id="{normalize-space($itemid)}"/>