Search code examples
rxml2

How to interpret [73] at end of an xml2 error message


I am using R to parse 60 large (0.5 GB each) XML files from the same source. I have code that works for all file except one, which returns this error message:

Error in doc_parse_file(con, encoding = encoding, as_html = as_html, options = options) : 
  expected '>' [73]

This comes from read_xml.character().

The message is clear that there is a missing character in the file, but to help me find it it would be goo to know: what does the [73] refer to?

(My first guess was line 73 of the file, but there's nothing obviously wrong with that).

I can't post a reproducible example because of the size of the file and its commercial in confidence; so I'd be happy just with a point on the error message.


Solution

  • The R package xml2 is basically a wrapper for the libxml2 parser. The libxml2 library defines a bunch of error codes. Here's a subset of those codes:

    XML_ERR_PUBID_REQUIRED = 71 : 71
    XML_ERR_LT_REQUIRED = 72 : 72
    XML_ERR_GT_REQUIRED = 73 : 73
    XML_ERR_LTSLASH_REQUIRED = 74 : 74
    XML_ERR_EQUAL_REQUIRED = 75 : 75
    

    So the number you see in the bracket in R is the error code returned from the xmllib2 library. In this case error 73 meas that a greater than symbol (GT) was expected but not found.

    Since this doesn't tell you exactly where the error occurred, you might want to use an xml validator to get more diagnostic information about what exactly happened in the file.