Search code examples
xmlxquerymarklogicxbrl

Querying the entity identifiers present in an XBRL instance with MarkLogic XQuery


I am using MarkLogic's Xquery console to play around with it and query my XBRL documents

This is how my XBRL doc looks like

<xbrli:xbrl xml:lang="nl" xmlns:link="http://www.xbrl.org/2003/linkbase" xmlns:bd-alg="http://www.nltaxonomie.nl/nt11/bd/20161207/dictionary/bd-algemeen" xmlns:bd-bedr="http://www.nltaxonomie.nl/nt11/bd/20161207/dictionary/bd-bedrijven" xmlns:bd-bedr-tuple="http://www.nltaxonomie.nl/nt11/bd/20161207/dictionary/bd-bedr-tuples" xmlns:bd-dim-mem="http://www.nltaxonomie.nl/nt11/bd/20161207/dictionary/bd-domain-members" xmlns:bd-dim-dim="http://www.nltaxonomie.nl/nt11/bd/20161207/validation/bd-axes" xmlns:xbrldi="http://xbrl.org/2006/xbrldi" xmlns:xbrli="http://www.xbrl.org/2003/instance" xmlns:iso4217="http://www.xbrl.org/2003/iso4217" xmlns:xlink="http://www.w3.org/1999/xlink">
    <link:schemaRef xlink:type="simple" xlink:href="http://www.nltaxonomie.nl/nt11/bd/20161207/entrypoints/bd-rpt-vpb-aangifte-2016.xsd"/>
     <xbrli:context id="ContextDurationDeclarant">
                <xbrli:entity>
                    <xbrli:identifier scheme="www.belastingdienst.nl/identificatie">800004449</xbrli:identifier>
                </xbrli:entity>
                <xbrli:period>
                    <xbrli:startDate>2016-01-01</xbrli:startDate>
                    <xbrli:endDate>2016-12-31</xbrli:endDate>
                </xbrli:period>
                <xbrli:scenario>
                    <xbrldi:explicitMember dimension="bd-dim-dim:PartyDimension">bd-dim-mem:Declarant</xbrldi:explicitMember>
                </xbrli:scenario>
            </xbrli:context>
    </xbrli:xbrl>

As you can see it is XML.

The query i want to do is

xquery version "1.0-ml";
declare namespace lang="nl";
declare namespace link="http://www.xbrl.org/2003/linkbase";
declare namespace bd-alg="http://www.nltaxonomie.nl/nt11/bd/20161207/dictionary/bd-algemeen";
declare namespace bd-bedr="http://www.nltaxonomie.nl/nt11/bd/20161207/dictionary/bd-bedrijven";
declare namespace bd-bedr-tuple="http://www.nltaxonomie.nl/nt11/bd/20161207/dictionary/bd-bedr-tuples";
declare namespace bd-dim-mem="http://www.nltaxonomie.nl/nt11/bd/20161207/dictionary/bd-domain-members";
declare namespace bd-dim-dim="http://www.nltaxonomie.nl/nt11/bd/20161207/validation/bd-axes";
declare namespace xbrldi="http://xbrl.org/2006/xbrldi";
declare namespace xbrli="http://www.xbrl.org/2003/instance";
declare namespace iso4217="http://www.xbrl.org/2003/iso4217";
declare namespace xlink="http://www.w3.org/1999/xlink";
declare namespace type="simple";
declare namespace href="http://www.nltaxonomie.nl/nt11/bd/20161207/entrypoints/bd-rpt-vpb-aangifte-2016.xsd";

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>Database dump</title>
  </head>
  <body>
  <b>XML Content</b>
{
    for $xbrli:context in doc("test.xml")/xbrli:xbrl
    return
    <pre>
        Identifier: { $xbrli:context/xbrli:entity/xbrli:identifier }
    </pre>
}
  </body>
</html>

I don't see the problem but Xquery apparently does, because i only get empty sequences. Please help me!


Solution

  • Both Martin and Loren have a point, and I also have a few more comments specific to XBRL.

    You need both to add xbrli:context in the XPath of the for clause, and to add /string() to the identifier to only get the string value (otherwise it will nest an element).

    {
        for $xbrli:context in doc("test.xml")/xbrli:xbrl/xbrli:context
        return
        <pre>
            Identifier: { $xbrli:context/xbrli:entity/xbrli:identifier/string() }
        </pre>
    }
    

    The xbrli:context element must be outside of the return clause, because in the general case, there are several contexts in the same XBRL instance, but only one entity per context if the instance is conformant with XBRL.

    Also, you may want to perform some duplicate elimination, because different contexts may contain the same entity. Actually, out of experience, an XBRL instance very, very often contains only one entity across all its contexts. This is the case with SEC filings and many other regulatory authorities (even though of course, XBRL places no restriction, so that other XBRL datasets may have several entities per instance).

    Then /string() becomes superfluous.

    {
      for $entity in distinct-values(
        doc("test.xml")/xbrli:xbrl/xbrli:context/xbrli:entity/xbrli:identifier
      )
      return <pre>Identifier: { $entity }</pre>
    }
    

    Actually, as xbrli:identifier will only appear at these places in a conformant XBRL instance, you can also use the descendant-or-self feature directly on this QName:

    {
      for $entity in distinct-values(
        doc("test.xml")//xbrli:identifier
      )
      return <pre>Identifier: { $entity }</pre>
    }
    

    Finally, in many filings, an XBRL fact will even report a friendlier name for the entity (in SEC filings, it's dei:EntityRegistrantName), while the entity identifier will give you the CIK (also reported as dei:EntityCentralIndexKey).