We have a usecase to load 100M records from shared object storage bucket to Mongo DB using the below connection resource code
HttpURLConnection httpConnection = null;
try {
httpConnection = (HttpURLConnection) this.url.openConnection();
ResourceUtils.useCachesIfNecessary(httpConnection);
if(StringUtils.hasText(byteRangeHeader)) {
httpConnection.setRequestProperty("Range", String.format("bytes=%s", byteRangeHeader));
}
inputStream = httpConnection.getInputStream();
} catch (Exception e) {
e.printStacktrace
}
return inputStream;
We use partition based on "range" headers and to load 100M records with 15 threads. This takes around 30 minutes. Problem with this is the http connection is getting closed by networking devices in 15 mins. How do we handle this scenario ?
You did not mention which reader you use, but if it inherits from AbstractItemCountingItemStreamItemReader
, then it could be used in a restart scenario even if it reads from a remote resource (it will resume reading from the last offset saved in the meta-data repository).
Another option if you have enough local storage, is to download the file or stage it in a database table/collection in a first step and then process it in a next step.