Following code is work really slow, almost 30 second to process 400 entities:
int page = 0;
org.springframework.data.domain.Page<MyEntity> slice = null;
while (true) {
if (slice == null) {
slice = repo.findAll(PageRequest.of(page, 400, Sort.by("date")));
} else {
slice = repo.findAll(slice.nextPageable());
}
if (!slice.hasNext()) {
break;
}
slice.getContent().forEach(v -> v.setApp(SApplication.NAME_XXX));
repo.saveAll(slice.getContent());
LOGGER.info("processed: " + page);
page++;
}
I use following instead, 4-6 sec per 400 entities (gcp lib to work with datastore)
Datastore service = DatastoreOptions.getDefaultInstance().getService();
StructuredQuery.Builder<?> query = Query.newEntityQueryBuilder();
int limit = 400;
query.setKind("ENTITY_KIND").setLimit(limit);
int count = 0;
Cursor cursor = null;
while (true) {
if (cursor != null) {
query.setStartCursor(cursor);
}
QueryResults<?> queryResult = service.run(query.build());
List<Entity> entityList = new ArrayList<>();
while (queryResult.hasNext()) {
Entity loadEntity = (Entity) queryResult.next();
Entity.Builder newEntity = Entity.newBuilder(loadEntity).set("app", SApplication.NAME_XXX.name());
entityList.add(newEntity.build());
}
service.put(entityList.toArray(new Entity[0]));
count += entityList.size();
if (entityList.size() == limit) {
cursor = queryResult.getCursorAfter();
} else {
break;
}
LOGGER.info("Processed: {}", count);
}
Why I can't use spring to do that batch processing?
Full discussion here: https://github.com/spring-cloud/spring-cloud-gcp/issues/1824
First:
you need to use correct lib version: at least 1.2.0.M2
Second:
you need to implement new method in repository interface:
@Query("select * from your_kind")
Slice<TestEntity> findAllSlice(Pageable pageable);
Final code looks like:
LOGGER.info("start");
int page = 0;
Slice<TestEntity> slice = null;
while (true) {
if (slice == null) {
slice = repo.findAllSlice(DatastorePageable.of(page, 400, Sort.by("date")));
} else {
slice = repo.findAllSlice(slice.nextPageable());
}
if (!slice.hasNext()) {
break;
}
slice.getContent().forEach(v -> v.setApp("xx"));
repo.saveAll(slice.getContent());
LOGGER.info("processed: " + page);
page++;
}
LOGGER.info("end");