Search code examples
javainputstreambyte-order-markrss-reader

RSS reader vs BOM error


I'm trying to read in an RSS Feed/XML file into my application. The problem is that there's a BOM (Byte Order Mark) that my inputStream doesn't like and it throws an error which throws another error and everything dies.

Here's the method:

private Document getDomFromXMLString(String xml) {
    Document doc = null;
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    try {
        DocumentBuilder db = dbf.newDocumentBuilder();
        InputSource is = new InputSource();
        is.setCharacterStream(new StringReader(xml));
        doc = db.parse(is);
    } catch (Exception e) {
        e.printStackTrace();
    }
    return doc;
}

So I'm trying to figure out how to effectively skip the BOM and input the rest of the file


Solution

  • If you have a character stream, and a String is, then skipping the BOM is as easy as stripping the first character, which is the BOM:

    if (xml.charAt(0) == '\ufeff')
        xml = xml.substring(1);
    

    What you should really do, though, is ask the source to fix its feed; the BOM shouldn't be there in the first place.