Parsing xml files with JDK SAX parser with character # in containing folder always fails

see subject. This was executed on a file with the following path (on Mac OSX):

/Volumes/RobExtL/xmltests/hurz#1/hurz.xml

This is a valid path.

The message indicates that the JDK parser somehow cannot deal with the "#" in the filename and cuts everything starting with it off.

The same file can be parsed using JDOM2 without any problems. The reason I am not using JDOM2 here is that this is a utility that only determines the root element name using SAX to avoid parsing potentially huge files, which in this case is performance-critical.

Stacktrace should contain all remaining information necessary.

Exception in thread "main" java.io.FileNotFoundException: /Volumes/RobExtL/xmltests/hurz (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:623)
at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(XMLVersionDetector.java:189)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:805)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1140)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:655)
at de.lesspain.xml.XML.getRootElementName(XML.java:69)
at de.lesspain.xml.XML.hasRootElement(XML.java:80)
at XMLEntityManagerErrorTest.main(XMLEntityManagerErrorTest.java:15)

Thanks in advance for any hints, Best

Solution

# in file and dir names make a valid filesystem path, sure. But it's completely invalid as a URL.

It is frequent when working with XML APIs to specify the XML document we want to parse, through its "system ID", which is just another name for its URI. Typical usage is to make it a relative URI to the current dir's URL, thus easily mistaken for a relative file path as they work the same... But # is invalid in URLs.

You should have shown the code you use to try and parse, so we could have been sure. It's weird that you thought you didn't need to show the code.