Search code examples
neo4jspring-data-neo4jspring-data-neo4j-4neo4j-ogm

Loading 9M rows from Neo4j and writing it into CSV throws out of memory exception


I have a big graph model and I need to write the result of following query into a csv.

Match (u:USER)-[r:PURCHASED]->(o:ORDER)-[h:HAS]->(i:ITEM) return u.id as userId,i.product_number as itemId

When I "Explain" query, this is the result I get : enter image description here

It shows that the estimated result is something around 9M. My problems are :

1) It takes alot of time to get a response. From neo4j-shell it takes 38 minutes! Is this normal? BTW I have all schema indexes there and they all are ONLINE.

2) When I use SpringDataNeo4j to fetch the result , it throws an "java.lang.OutOfMemoryError: GC overhead limit exceeded" error , and that happens when SDN tries to convert the loaded data to our @QueryResult object.

I tried to optimize the query in all different ways but nothing was changed ! My impression is that I am doing something wrong. Does anyone have any idea how I can solve this problem? Should I go for Batch read/write ?

P.S I am using Neo4j comunity edition Version:3.0.1 and these are my sysinfos: enter image description here

and these are my server configs.

dbms.jvm.additional=-Dunsupported.dbms.udc.source=tarball
use_memory_mapped_buffers=true
neostore.nodestore.db.mapped_memory=3G
neostore.relationshipstore.db.mapped_memory=4G
neostore.propertystore.db.mapped_memory=3G
neostore.propertystore.db.strings.mapped_memory=1000M
neostore.propertystore.db.index.keys.mapped_memory=500M
neostore.propertystore.db.index.mapped_memory=500M

Solution

  • Thanks to Vince's and Michael comments I found a solution ! After doing some experiments it got clear that the server response time is actually good ! 1.5 minute for 9 million data ! The problem is with SDN as Vince mentioned ! The OOM happens when SDN tries to convert the data to @QueryResult Object. Increasing heap memory for our application is not a permanent solution as we will have more rows in future ! So we decide to use neo4j-jdbc-driver for big data queries... & it works like a jet ! Here is the code example we used :

    Class.forName("org.neo4j.jdbc.Driver");
        try (Connection con = DriverManager.getConnection("jdbc:neo4j:bolt://HOST:PORT", "USER", "PASSWORD")) {
    
            // Querying
            String query = "match (u:USER)-[r:PURCHASED]->(o:ORDER)-[h:HAS]->(i:ITEM) return u.id as userId,i.product_number as itemId";
            con.setAutoCommit(false); // important for large dataset
            Statement st = con.createStatement();
            st.setFetchSize(50);// important for large dataset
    
                try (ResultSet rs = st.executeQuery(query)) {
                    while (rs.next()) {
                        writer.write(rs.getInt("userId") + ","+rs.getInt("itemId"));
                        writer.newLine();
                    }
    
                }
    
            st.setFetchSize(0);
            writer.close();
            st.close();
    
        }
    

    Just make sure you use " con.setAutoCommit(false); " and "st.setFetchSize(50)" if you know that you are going to load a large dataset. Thanks Everyone !