Search code examples
pythonxmlbeautifulsoupxbrl

XBRL label names differ between instance and calculation documents


I have what is, probably, a very stupid question, but I'm stumped by it and would appreciate any help. I'm trying to gather xbrl data from SEC filings using Python and BeautifulSoup. One problem I'm having is that certain line items are referred to differently in the instance document and the calculation linkbase. As a concrete example, take this recent 10-K from PHI Group Inc.: https://www.sec.gov/Archives/edgar/data/704172/000149315221015100/0001493152-21-015100-index.htm A line item with the xbrl tag 'WriteoffOfFinancingCosts' shows up as <PHIL:WriteoffOfFinancingCosts ...> in the instance document (along with a value and contexts) but shows up as 'loc_PHILWriteoffOfFinancingCosts' in the calculation linkbase. But this relationship, 'PHIL:' = 'loc_PHIL', isn't standard across XBRL filings. How does one know what prefix will be added to a tag in the calculation linkbase so that (with the prefix removed) it can be reliably tied back to a tag in the instance document? I can think of various workarounds, but it just seems silly; isn't there somewhere I can look in the calculation linkbase or elsewhere that will just TELL me exactly what prefix is added? As some (possibly relevant) nuance: lots of tags in lots of filings, of course, have a prefix like 'us-gaap', indicating the us-gaap namespace, but that doesn't seem to guarantee that a tag in the calculation linkbase will therefore look like 'us-gaapAccountsPayableCurrent' and not 'loc_us-gaapAccountsPayableCurrent' or 'us-gaap:AccountsPayableCurrent' or some other variation of the basic pattern, all of which, of course, look different to BeautifulSoup. Can anyone point me in the right direction?


Solution

  • PHIL:WriteoffOfFinancingCosts is the name of the XBRL concept, while loc_PHILWriteoffOfFinancingCosts is the (calculation linkbase) label of the locator pointing to the concept PHIL:WriteoffOfFinancingCosts. This mechanism is the way linkbases connect concepts together: each locator is a "proxy" to a concept.

    loc_PHILWriteoffOfFinancingCosts is thus an internal detail of the calculation linkbase. The names of linkbase labels are in principle "free to choose", however there are conventions that established themselves (such as prefixing with loc_) but I would not rely on them. Rather, you can "follow the trail" by looking at the definition of the linkbase label:

    <link:loc xlink:type="locator"
              xlink:href="phil-20200630.xsd#PHIL_WriteoffOfFinancingCosts"
              xlink:label="loc_PHILWriteoffOfFinancingCosts" />
    

    Where you see, thanks to the xlink:href attribute, that this locator points to the concept with the ID PHIL_WriteoffOfFinancingCosts in file phil-20200630.xsd.

    <element id="PHIL_WriteoffOfFinancingCosts"
             name="WriteoffOfFinancingCosts" .../>
    

    And you can see that the local name of this concept is WriteoffOfFinancingCosts. It is in the namespace commonly associated with prefix PHIL: but never appears in a concept definition as all concepts in that file are in the namespace commonly associated with PHIL:. Now, how do we know this? because at the top of the xsd file, it says targetNamespace="http://phiglobal.com/20200630" and the prefix PHIL: is also attached to this namespace in the instance file phil-20200630.xml with xmlns:PHIL="http://phiglobal.com/20200630"

    It is common practice to choose concept IDs with the prefix followed by underscore followed by the local name. Some users rely on it, but following the levels of indirection, in spite of being more complex, is "safer": linkbase label loc_PHILWriteoffOfFinancingCosts -> concept ID PHIL_WriteoffOfFinancingCosts -> concept local name WriteoffOfFinancingCosts -> concept's fully qualified name PHIL:WriteoffOfFinancingCosts.

    You probably notice how complex this is. In fact, this is the reason why it is worth using an XBRL processor, which will do all of this for you.