Search code examples
springgoogle-cloud-platformgoogle-cloud-datastorespring-cloud-gcp

Spring GCP - Datastore performance: Batch processing, iteration through all entity list is very slow


Following code is work really slow, almost 30 second to process 400 entities:

    int page = 0;
    org.springframework.data.domain.Page<MyEntity> slice = null;
    while (true) {
        if (slice == null) {
            slice = repo.findAll(PageRequest.of(page, 400, Sort.by("date")));
        } else {
            slice = repo.findAll(slice.nextPageable());
        }
        if (!slice.hasNext()) {
            break;
        }
        slice.getContent().forEach(v -> v.setApp(SApplication.NAME_XXX));
        repo.saveAll(slice.getContent());
        LOGGER.info("processed: " + page);
        page++;
    }

I use following instead, 4-6 sec per 400 entities (gcp lib to work with datastore)

    Datastore service = DatastoreOptions.getDefaultInstance().getService();
    StructuredQuery.Builder<?> query = Query.newEntityQueryBuilder();
    int limit = 400;
    query.setKind("ENTITY_KIND").setLimit(limit);

    int count = 0;
    Cursor cursor = null;
    while (true) {
        if (cursor != null) {
            query.setStartCursor(cursor);
        }
        QueryResults<?> queryResult = service.run(query.build());

        List<Entity> entityList = new ArrayList<>();
        while (queryResult.hasNext()) {
            Entity loadEntity = (Entity) queryResult.next();
            Entity.Builder newEntity = Entity.newBuilder(loadEntity).set("app", SApplication.NAME_XXX.name());
            entityList.add(newEntity.build());
        }
        service.put(entityList.toArray(new Entity[0]));
        count += entityList.size();

        if (entityList.size() == limit) {
            cursor = queryResult.getCursorAfter();
        } else {
            break;
        }
        LOGGER.info("Processed: {}", count);
    }

Why I can't use spring to do that batch processing?


Solution

  • Full discussion here: https://github.com/spring-cloud/spring-cloud-gcp/issues/1824

    First:

    you need to use correct lib version: at least 1.2.0.M2

    Second:

    you need to implement new method in repository interface:

    @Query("select * from your_kind")
    Slice<TestEntity> findAllSlice(Pageable pageable);
    

    Final code looks like:

        LOGGER.info("start");
        int page = 0;
        Slice<TestEntity> slice = null;
        while (true) {
            if (slice == null) {
                slice = repo.findAllSlice(DatastorePageable.of(page, 400, Sort.by("date")));
            } else {
                slice = repo.findAllSlice(slice.nextPageable());
            }
            if (!slice.hasNext()) {
                break;
            }
            slice.getContent().forEach(v -> v.setApp("xx"));
            repo.saveAll(slice.getContent());
            LOGGER.info("processed: " + page);
            page++;
        }
        LOGGER.info("end");