I'm processing large (1TB) XML files using the StAX API. Let's assume we have a loop handling some elements:
XMLInputFactory fac = XMLInputFactory.newInstance();
XMLStreamReader reader = fac.createXMLStreamReader(new FileReader(inputFile));
while (true) {
if (reader.nextTag() == XMLStreamConstants.START_ELEMENT){
// handle contents
}
}
How do I keep track of overall progress within the large XML file? Fetching the offset from reader works fine for smaller files:
int offset = reader.getLocation().getCharacterOffset();
but being an Integer offset, it'll probably only work for files up to 2GB...
A simple FilterReader
should work.
class ProgressCounter extends FilterReader {
long progress = 0;
@Override
public long skip(long n) throws IOException {
progress += n;
return super.skip(n);
}
@Override
public int read(char[] cbuf, int off, int len) throws IOException {
int red = super.read(cbuf, off, len);
progress += red;
return red;
}
@Override
public int read() throws IOException {
int red = super.read();
progress += red;
return red;
}
public ProgressCounter(Reader in) {
super(in);
}
public long getProgress () {
return progress;
}
}