Search code examples
javajvmneo4jdatabase-performancespring-data-neo4j

save method of CRUDRepository is very slow?


i want to store some data in my neo4j database. i use spring-data-neo4j for that.

my code is like the follow:

    for (int i = 0; i < newRisks.size(); i++) {
        myRepository.save(newRisks.get(i));
        System.out.println("saved " + newRisks.get(i).name);
    }

My newRisks-array contains circa 60000 objects and 60000 edges. Every node and edge has one property. The duration of this loop is circa 15 - 20 minutes, is this normal? I used Java VisualVM to search some bottlenecks, but my average CPU usage was 10 - 25% (of 4 cores) and my heap was less than half full.

There are any options to boost up this operation?


EDIT: additional is, on the first call of myRepository.save(newRisks.get(i)); the jvm falling assleep fpr some minutes before the first output is comming

Second EDIT:

Class Risk:

@NodeEntity
public class Risk {
    //...
    @Indexed
    public String name;

    @RelatedTo(type = "CHILD", direction = Direction.OUTGOING)
    Set<Risk> risk = new HashSet<Risk>();

    public void addChild(Risk child) {
        risk.add(child);
    }

    //...
}

Creating Risks:

@Autowired
private Repository myRepository;

@Transactional
public Collection<Risk> makeSomeRisks() {

    ArrayList<Risk> newRisks = new ArrayList<Risk>();

    newRisks.add(new Risk("Root"));

    for (int i = 0; i < 60000; i++) {
        Risk risk = new Risk("risk " + (i + 1));
        newRisks.get(0).addChild(risk);
        newRisks.add(risk);
    }

    for (int i = 0; i < newRisks.size(); i++) {
        myRepository.save(newRisks.get(i));
    }

    return newRisks;
}

Solution

  • The problem here is that you are doing mass-inserts with an API that is not intended for that.

    You create a Risk and 60k children, you first save the root which also persists the 60k children at the same time (and creates the relationships). That's why the first save takes so long. And then you save the children again.

    There are some solutions to speed it up with SDN.

    1. don't use the collection approach for mass inserts, persist both participants and use template.createRelationshipBetween(root, child, "CHILD",false);

    2. persist the children first then add all the persisted children to the root object and persist that

    3. As you did, use the Neo4j-Core API but call template.postEntityCreation(node,Risk.class) so that you can access the entities via SDN. Then you also have to index the entities on your own (db.index.forNodes("Risk").add(node,"name",name);) (or use the neo4j core-api auto-index, but that's not compatible with SDN).

    4. Regardless with the core-api or SDN you should use tx-sizes of around 10-20k nodes/rels for best performance