Search code examples
rubyxmlnokogirixml-namespacesinkscape

Selecting elements with specific namespaced attributes


My problem is specifically about some trouble I'm having parsing an Inkscape (XML) file, but it's solution should be applicable to any XML doc, so I feel it's Stackoverflow relevant.

I'm trying to use the Nokogiri CSS selectors to get all the <g> elements that have the attribute inkscape:groupmode="layer". But the colon is causing the error:

Nokogiri::CSS::SyntaxError: unexpected ':' after 'inkscape'

My XML document looks like:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:svg="http://www.w3.org/2000/svg" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape" width="744.09448819" height="1052.3622047" id="svg3720" version="1.1" inkscape:version="0.48.1 r9760" sodipodi:docname="test.svg">
  <defs id="defs3722">
    <inkscape:perspective sodipodi:type="inkscape:persp3d" inkscape:vp_x="0 : 526.18109 : 1" inkscape:vp_y="0 : 1000 : 0" inkscape:vp_z="744.09448 : 526.18109 : 1" inkscape:persp3d-origin="372.04724 : 350.78739 : 1" id="perspective3728"/>
  </defs>
  <sodipodi:namedview id="base" pagecolor="#ffffff" bordercolor="#666666" borderopacity="1.0" inkscape:pageopacity="0.0" inkscape:pageshadow="2" inkscape:zoom="0.35" inkscape:cx="375" inkscape:cy="634.28571" inkscape:document-units="px" inkscape:current-layer="g2818" showgrid="false" inkscape:window-width="550" inkscape:window-height="483" inkscape:window-x="66" inkscape:window-y="471" inkscape:window-maximized="0"/>
  <metadata id="metadata3725">
    <rdf:RDF>
      <cc:Work rdf:about="">
        <dc:format>image/svg+xml</dc:format>
        <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
        <dc:title/>
      </cc:Work>
    </rdf:RDF>
  </metadata>
  <g inkscape:label="Layer 1" inkscape:groupmode="layer" id="layer1">
    <rect style="fill:#d2e149;fill-opacity:1;stroke:none" id="rect2812" width="211.42857" height="128.57143" x="168.57143" y="215.21933" ry="64.285713"/>
  </g>
  <g inkscape:label="Layer 1 copy copy" inkscape:groupmode="layer" id="g2818">
    <rect style="fill:#d2e149;fill-opacity:1;stroke:none" id="rect2820" width="211.42857" height="128.57143" x="145.71428" y="615.2193" ry="64.285713"/>
  </g>
</svg>

My selector looks like:

nokogiri_document.css('[inkscape:groupmode="layer"]').to_html

I also tried replacing the colon with a pipe

How do I write the CSS selector to work on the inkscape:groupmode attribute...or for that matter any foo:bar attribute?


Solution

  • Use XPath, specifying the namespace for the g elements. Since your root element declares the xmlns:svg to be the same as the new default namespace (xmlns) you can use svg as your prefix:

    require 'nokogiri'
    doc = Nokogiri.XML(IO.read('contents.xml'))
    layers = doc.xpath('//svg:g[@inkscape:groupmode="layer"]')
    
    p layers.map{ |layer| layer['id'] }
    #=> ["layer1", "g2818"]
    

    Decoded, the above XPath says:

    • // - At any level of the document
    • svg:g - …find g elements with a namespace matching the svg namespace
    • […] - …but only if the contents of this are met
    • @inkscape:groupmode - …there is an attribute (@) named groupmode with a namespace matching inkscape
    • ="layer" - and the intrinsic value of this attribute is the text layer.

    Alternatively, if you're just trying to read this file (and not manipulate and re-save it) you can use the gross-but-simplifying hack of removing all namespaces. In this case, your original code works simply:

    doc.remove_namespaces!
    p doc.css('g[groupmode="layer"]').map{ |g| g['id'] }
    #=> ["layer1", "g2818"]