Search code examples
javalinuxstring-parsing

Why would java save empty strings when parsing this XML?


I am trying to have java parse the lines of an XML file and when it finds the line contains a specific word, get the value out of that element. This is all done through string manipulation. I have tested locally that this works using a copy of the same file it would be looking at on the server it will end up.

However, for some reason when run through the remote server it does not work as intended. It shows only empty strings as a result where before it showed the text in the elements. The server where this is run is also actively reading from this file, but it should just be once at runtime for a different process. It is also still printing out the correct number of lines to show that it is seeing the correct values as it did before, and can correctly print those lines out to a log file if it does so in full.

This is the function that handles the parsing of the file:

private HashSet<String> parseFile() throws ProcessingException{
    String fileLocation = getInterfaceLocation();
    HashSet<String> fileMasks = new HashSet<String>();

    try {
        File file = new File(fileLocation);
        BufferedReader br = new BufferedReader(new FileReader(file));

        boolean inFileSet = false;

        String line = "";
        while((line = br.readLine()) != null) {
            if(line.toLowerCase().contains("<fileset")) {
                inFileSet = true;
                continue;
            }
            if(line.toLowerCase().contains("</fileset>")) {
                inFileSet = false;
            }

            if(inFileSet) {
                log(line);
                if(!line.toLowerCase().contains("<include")) {
                    continue;
                }
                else {
                    line = line.substring(line.indexOf("name=") + "name=".length() + 1);
                    line = line.substring(0, line.indexOf("\""));
                    log("Adding mask = ", line);
                    fileMasks.add(line);
                }
            }
        }
        br.close();
    } catch(IOException e) {
        throw new ProcessingException("Unable to open the TESTFILE.xml file",e);
    }
    return fileMasks;
}

And here is the applicable portion of the XML file that it is parsing:

<fileset>             
    <include name="filetype1*.csv"/>
    <include name="filetype2*.csv"/>
    <include name="filetype3*.csv"/>
    <include name="filetype4*.csv"/>
    <include name="filetype5*.csv"/>
    <include name="filetype6*.csv"/>
    <include name="filetype7*.csv"/>
    <include name="filetype8*.csv"/>
    <include name="filetype9*.csv"/>
    <include name="filetype10*.csv"/>
    <include name="filetype11*.csv"/>
    <include name="filetype12*.csv"/>
    <include name="filetype13*.csv"/>
    <include name="filetype14*.csv"/>
</fileset>

In my test environment (Windows 10) I see the following output:

<include name="filetype1*.csv"/>
Adding mask = filetype1*.csv
<include name="filetype2*.csv"/>
Adding mask = filetype2*.csv
<include name="filetype3*.csv"/>
...
<include name="filetype14*.csv"/>
Adding mask = filetype14*.csv

And in the remote server's environment I get:

<include name="filetype1*.csv"/>
Adding mask = 
<include name="filetype2*.csv"/>
Adding mask = 
<include name="filetype3*.csv"/>
...
<include name="filetype14*.csv"/>
Adding mask = 

Solution

  • The following should be a drop-in replacement for your parseFile method that uses XPath to find the data you are looking for.

    The XPath expression //include[@name] means: "give me all <include> in the document regardless of location that have a name attribute"

    import org.w3c.dom.Document;
    import org.w3c.dom.Element;
    import org.w3c.dom.NodeList;
    
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import javax.xml.xpath.*;
    
    /* Other code here */
    
    private HashSet<String> parseFile()
            throws ProcessingException
    {
        String fileLocation = getInterfaceLocation();
        HashSet<String> fileMasks = new HashSet<>();
    
        File file = new File(fileLocation);
    
        try {
            // BEGIN: DOM Boilerplate
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = factory.newDocumentBuilder();
            XPathFactory xPathfactory = XPathFactory.newInstance();
            XPath xpath = xPathfactory.newXPath();
            // END: DOM Boilerplate
    
            Document doc = builder.parse(file);
    
            XPathExpression includeQuery = xpath.compile("//include[@name]");
            NodeList includes = (NodeList) includeQuery.evaluate(doc, XPathConstants.NODESET);
            for (int i = 0; i < includes.getLength(); i++) {
                Element include = (Element) includes.item(i);
                fileMasks.add(include.getAttribute("name"));
            }
        } catch (Exception e) {
            throw new ProcessingException("Failed to parse file", e);
        }
    
        return fileMasks;
    }