I have a big graph model and I need to write the result of following query into a csv.
Match (u:USER)-[r:PURCHASED]->(o:ORDER)-[h:HAS]->(i:ITEM) return u.id as userId,i.product_number as itemId
When I "Explain" query, this is the result I get :
It shows that the estimated result is something around 9M. My problems are :
1) It takes alot of time to get a response. From neo4j-shell it takes 38 minutes! Is this normal? BTW I have all schema indexes there and they all are ONLINE.
2) When I use SpringDataNeo4j to fetch the result , it throws an "java.lang.OutOfMemoryError: GC overhead limit exceeded" error , and that happens when SDN tries to convert the loaded data to our @QueryResult object.
I tried to optimize the query in all different ways but nothing was changed ! My impression is that I am doing something wrong. Does anyone have any idea how I can solve this problem? Should I go for Batch read/write ?
P.S I am using Neo4j comunity edition Version:3.0.1 and these are my sysinfos:
and these are my server configs.
dbms.jvm.additional=-Dunsupported.dbms.udc.source=tarball
use_memory_mapped_buffers=true
neostore.nodestore.db.mapped_memory=3G
neostore.relationshipstore.db.mapped_memory=4G
neostore.propertystore.db.mapped_memory=3G
neostore.propertystore.db.strings.mapped_memory=1000M
neostore.propertystore.db.index.keys.mapped_memory=500M
neostore.propertystore.db.index.mapped_memory=500M
Thanks to Vince's and Michael comments I found a solution ! After doing some experiments it got clear that the server response time is actually good ! 1.5 minute for 9 million data ! The problem is with SDN as Vince mentioned ! The OOM happens when SDN tries to convert the data to @QueryResult Object. Increasing heap memory for our application is not a permanent solution as we will have more rows in future ! So we decide to use neo4j-jdbc-driver for big data queries... & it works like a jet ! Here is the code example we used :
Class.forName("org.neo4j.jdbc.Driver");
try (Connection con = DriverManager.getConnection("jdbc:neo4j:bolt://HOST:PORT", "USER", "PASSWORD")) {
// Querying
String query = "match (u:USER)-[r:PURCHASED]->(o:ORDER)-[h:HAS]->(i:ITEM) return u.id as userId,i.product_number as itemId";
con.setAutoCommit(false); // important for large dataset
Statement st = con.createStatement();
st.setFetchSize(50);// important for large dataset
try (ResultSet rs = st.executeQuery(query)) {
while (rs.next()) {
writer.write(rs.getInt("userId") + ","+rs.getInt("itemId"));
writer.newLine();
}
}
st.setFetchSize(0);
writer.close();
st.close();
}
Just make sure you use " con.setAutoCommit(false); " and "st.setFetchSize(50)" if you know that you are going to load a large dataset. Thanks Everyone !