Here is my problem statement,
-Step 1: Call a mongo db and get all the documents ( basically a request) for that day.
-Step 2: For each document from the above make an other call to different mongodb which eventually fetch ~70k documents.
-Step 3: Then the above 70k documents ( some fields ) are processed in a chunk to create an .csv file.
-Step 4: Then this .csv file will be stored in drive and location info will be sent to an user via email.
-Step 5: Repeat the step 2 to 4 for each individual document from step 1.
To achieve this , am exploring Spring Batch option(More performance optimistic) but not able to figure it out how to do. Any expertise advice/ guidance will be more helpful.
Thanks,
from what I see the step 2 to 4 is a MongoItemReader (get the documents to Mongo), process some fields and the a CSV item writer. All those three task should be a step and should be run with chunks.
MongoItemReader should be like:
@Bean
@StepScope
public MongoItemReader<ReportFiles> reader(MongoTemplate mongoTemplate) {
Criteria expiredFiles = Criteria.where("DateCreated").lte(expiredDate);
HashMap<String, Sort.Direction> sortMap = new HashMap<>();
sortMap.put("ReportFiles", Sort.Direction.DESC);
MongoItemReader<ReportFiles> reader = new MongoItemReader<>();
reader.setTemplate(mongoTemplate);
reader.setSort(sortMap);
reader.setTargetType(ReportFiles.class);
reader.setQuery(Query.query(expiredFiles).getQueryObject().toString());
return reader;
}
The processor do some stuff with the fields:
@Override
public Docuement process(Docuement doc) {
// Do some stuff with doc
return doc;
}
and Flat FileitemWriter
@Bean
public FlatFileItemWriter<Document> writer(Resource outputResource)
{
FlatFileItemWriter<Document> writer = new FlatFileItemWriter<>();
writer.setResource(outputResource);
writer.setAppendAllowed(true);
writer.setLineAggregator(new DelimitedLineAggregator<Document>() {
{
setDelimiter(",");
setFieldExtractor(new BeanWrapperFieldExtractor<Document>() {
{
setNames(new String[] { "docID", "docName", "docPath" });
}
});
}
});
return writer;
}
The Read-Process- Write Step :
@Bean
public Step step(StepBuilderFactory stepBuilderFactory,
MongoItemReader<Document> reader,
Processors processors,
FlatFileItemWriter<Document> writer
) {
return stepBuilderFactory
.get("MainStep").<Document, Document>chunk(100)
.reader(reader)
.processor(processors)
.writer(writer)
.build();
}
The definition of the job should include the first step of reading(step 1)
@Bean
public Job job(JobBuilderFactory jobBuilderFactory,
Step readMongoFirstTime,
Step step) {
return jobBuilderFactory
.get(appName)
.start(readMongoFirstTime)
.next(readProcessWriteStep)
.next(emailStep)
.build();
}
Depending on the amount of data that you need to read in the first step, you can pass the info from step 1 to the seconds step through the job execution context. However you should be careful if it is too much data.
The last step need to be send the email. Another thing to be careful is the amount of memory that the files created are occupying.