Identifying XBRL Documents

After reading about XBRL Validation, it would be a great feature to add to a work in progress program. However, due to performance limitations, I can't read in the entire document into the system for validation, as large amount of documents maybe flowing into the system for processing, or the document itself could be large.

I thought, maybe by reading the first few bytes of the document, we could identify whether the document is an xbrl or not. Assuming that in an xbrl document, the first few bytes of the xbrl (without the xml declaration) will always start with either be "xbrl" or "xbrli:xbrl"

Would it be safe to assume that, an XBRL document is defined by the root tag of the document to either be "xbrl" or "xbrli:xbrl"? Or is there a better way to identify an xbrl document without having to parse the entire document?

Thanks!

Solution

It is not safe to assume this. Though, if a 95% hitrate is good enough for you then its good.

It would be almost 100% safe if you would check for the prefix explicitly:

check for xmlns:prefix="http://www.xbrl.org/2003/instance" and for a root <prefix:xbrl ...>
check for xmlns="http://www.xbrl.org/2003/instance" and for a root <xbrl ...>

Maybe, you will find a working regular expression to match those. The point is, that you cannot assume that the prefix is always none or xbrli.

The safe way to do it is to use a SAX parser (which does not parse an entire document). See for example this question: Determine root Element during SAX parsing