Search code examples
performanceindexingimportneo4jbulk-load

Neo4j bulk import and indexing


I'm importing a big dataset (far over 10m nodes) into neo4j using the neo4j-import tool. After importing my data I run several queries over it. One of those queries performs very badly. I optimized it (PROFILING, using relationship types, splitting up for multicore support and so on) as much as I could.

Still it takes too long, so my idea was to tell neo4j to start at a specific type of nodes by using the USING INDEX clause. I then could check how my db hits change and possibly make it work. Right now my database doesn't have indexes though.

I wanted to create indexes when I'm done writing all the queries I need, it seems I need to start using them already though.

I'm wondering if I can create those indexes during the bulk import process. That seems to be a good solution to me. How would I do that?

Also I wonder if it's possible to actually write a statement that would create indexes for an attribute that exists on every single one of my nodes (let's call it "type").

CREATE INDEX ON :(type);

doesn't work (label is missing but I want to omit it)


Solution

  • Indexes are on Labels + Properties. You need indexes right after your import and before you start trying to optimize queries. Anything your query will use to find a starting point should be indexed (user_id, object_id, etc) and probably any dates or properties used for range queries (modified_on, weight, etc).

    CREATE INDEX ON :Label(property)
    

    Cypher queries are single threaded so I have no idea what you mean by multi-core support. What did you read about that, got a link? You can multi-thread Neo4j, but at this point you have to do it manually. See https://maxdemarzi.com/2017/01/06/multi-threading-a-traversal/

    Most of the time, the queries can be greatly optimized with an index or expressing it differently. But sometimes you need to redo your model to fit the query. Take a look at https://maxdemarzi.com/2015/08/26/modeling-airline-flights-in-neo4j/ for some hints.