Search code examples
pythonjsonpython-3.xxbrl

Mapping XBRL report instances to JSON


I'm looking to transform XBRL report instances, specifically financial reports such as those produced by the SEC, into Python dictionaries or JSON. I've spent time developing code using bs4 (beautiful soup), but ideally I'd like to leverage the open source Arelle library.

My understanding is there is a plug-in for the Arelle software package called "saveLoadableOIM". There is general guidance published by XBRL.org; however, it stops short of practical implementation.

http://www.xbrl.org/Specification/xbrl-json/CR-2020-05-06/xbrl-json-CR-2020-05-06.html

I have found the documentation for command prompt usage of Arelle to be out-of-date & inapplicable to Python 3.x. Could anyone provide guidance on how to operate Arelle through the python command prompt; and, specifically, how to convert a SEC xBRL report instance into JSON? I'd like a model that's adaptive to future changes in the standard taxonomies, particularly GAAP:

https://www.sec.gov/info/edgar/edgartaxonomies.shtml

It would be particularly helpful to have sample code for mapping the following XBRL report instance of a MSFT 10-K into JSON:

https://www.sec.gov/Archives/edgar/data/789019/000156459018019062/msft-20180630.xml

If there are limitations in the existing Arelle library, I'd like to understand what these are.


Solution

  • I invoke Arelle under Python 3 with:

    python3 $HOME/Arelle/arelleCmdLine.py
    

    This is on Linux, and assumes I have Arelle checked out in my home directory as Arelle.

    To load a plugin, use --plugins and give it the name of a file under the Arelle/arelle/plugin directory (without the .py extension). For example, --plugins=saveLoadableOIM. You can then add --help and you should see additional options included in the help message.

    This works for me:

    python3 $HOME/Arelle/arelleCmdLine.py --plugins=saveLoadableOIM --saveLoadableOIM=msft.json -f https://www.sec.gov/Archives/edgar/data/789019/000156459018019062/msft-20180630.xml
    

    Example of extracting data using the awesome jq:

    jq '[.facts[] | select( .dimensions.concept | test(":GrossProfit$") )] | sort_by(.dimensions.period)[-1]' msft.json
    

    This gets the most recent GrossProfit value:

    {
      "value": "20343000000",
      "decimals": -6,
      "dimensions": {
        "concept": "us-gaap:GrossProfit",
        "entity": "cik:0000789019",
        "period": "2018-04-01T00:00:00/2018-07-01T00:00:00",
        "unit": "iso4217:USD"
      }
    }
    

    I should note that the xBRL-JSON specification is not yet finalised, and it's likely that the format of this JSON may change slightly over time. I'd expect Arelle to be updated to the final version once it's available.