Search code examples
javaandroidxmlmetricscode-complexity

Is there a well-defined way to measure size and/or complexity of XML files?


Usually LOC is one of widely used metrics for measuring source code of programs. It works perfectly for measuring size of Java or C code. However, in one of our current research projects, we need to measure the size of code in XML files. LOC seems not a good fit for this purpose, due to the flexibility of XML format.

I was wondering whether there is a good way to measure size or complexity of XML code. I have searched online, and most published research work focus on defining complexity of XML schema, DTD, instead of XML files. Such as: Metrics for XML Document Collections

I also find that there are tools/libraries can count/list nodes or elements based on different tag names. Such as: Counting number of element in xml file and Simplest way to get XML node count

However, our research does not care about names of tags or elements. We only need a well-defined metric to measure size or complexity of code in XML files, especially Android layout files and AndroidManifest.xml files.


Solution

  • Well-defined ways to measure XML files

    Size

    • XML file byte count
    • Text content character count
    • {Element|Attribute|DOM node} count
    • Aggregates of above measures

    Complexity

    • Unique {element|attribute} name count
    • Maximum or average {depth|width} of element tree hierarchy
    • Directed Acyclic Graph measures for ID/IDREF DAG structures
    • Size of smallest schema that would validate the XML
      • Limited to a specific schema standard {XSD|DTD|RelaxNG|...}
      • Limited to a specific schema feature subset (eg: no xsd:any, ...)
    • Kolmogorov complexity of XML file as a string