Search code examples
javastax

Current state END_ELEMENT is not among the statesCHARACTERS, COMMENT, CDATA, SPACE, ENTITY_REFERENCE, DTD valid for getText()


I'm so new to java but I'm doing this project for school. I have a 4GB XML file (it's a wikipedia dump) need to parse. I use StAX and my code run succsefully for more than 400,000 lines (almost 50MB) but then I get this error.

Exception in thread "main" java.lang.IllegalStateException: Current state END_ELEMENT is not among the statesCHARACTERS, COMMENT, CDATA, SPACE, ENTITY_REFERENCE, DTD valid for getText() at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getText(XMLStreamReaderImpl.java:1081) at tagremoving1.TagRemoving1.main(TagRemoving1.java:65)

I read somewhere when I use getText() I shoul check for null or empty elements so I did. Then it goes further but stops again with the same error. I looked up almost everywhere. I don't know what's wrong. This is my code:

XMLInputFactory factory = XMLInputFactory.newInstance();
     File file = new File("source.xml");
     FileInputStream fileReader = new FileInputStream(file);    
     factory.setProperty(XMLInputFactory.IS_COALESCING, true);
            factory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES,true);
            factory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES,false);
     PrintWriter writer1 = new PrintWriter("result.txt", "UTF-8");   

    XMLStreamReader reader = factory.createXMLStreamReader(fileReader);
    int counter = 1;
    while(reader.hasNext()){

        if(reader.next() == 1){ //If it is START_ELEMENT
            String name = reader.getLocalName();
            switch(name){
                case "page":
                    writer1.println("\r\npage" + counter + ":");  
                    counter++;
                    break;

                case "title":
                    reader.next();
                    if(reader != null && !"".equals(reader.toString())) 
                            writer1.println("Title: " + reader.getText());
                    break;

                case "text":
                    reader.next();
                    if(reader != null && !"".equals(reader.toString()))
                        writer1.println("Text: " + reader.getText());
                    break;

                default:
                    break;
            }
        }

    }
    writer1.flush();
    writer1.close();

Any suggestion?


Solution

  • Well, I figured it out!

    I add another condition reader.hasText() to final 'if' and then everything was fine. Here is the code:

    case "text":
        reader.next();
        if(reader != null && !"".equals(reader.toString()) && reader.hasText())                     
        writer1.println("Text: " + reader.getText());
        break;