Search code examples
.netgraph-databasesgremlinjanusgraph

Janusgraph/titan reconnaissance


I am currently building an application for social market network I am using janusgraph 2.0. I am confusing if my decision to use janusgraph was wrong. My difficulties are:

  1. My SQL database contains over 443000 record about people, buildings, shops, and other. but each time I am trying to insert data into graph it takes 10 minutes for 1000 record. That will drive me crazy.
  2. I create some composite and mixed index before inserting data but when I insert data I can't create a new index (their properties keys stuck into In stalled) for that I am obligated to recreate the schema and insert the data again.
  3. After insert data I tried to re-index the index because some index doesn't seem to work well. its work for composite index by it doesn't work for Mixed index.
  4. I am using Gremlin.Net to deal with janus but the problem is I can't use textContains to perform queries like select * fom People where firstname like 'janus'. So why I am using Index backend like Elasticsearch then ?

I am using janusgraph v0.2, cassandra db v3.11, elasticsearch 5.6.3, Gremlin.Net for the programming language. I don't know where is the mistake. the database technologies (janusgraph), or the programming language used or both. Please target target each one point by its number, I am new in graph database so please explain.


Solution

  • It's a bit hard to answer all of your questions as they address at least 3 different concerns. So I will just try to answer t

    1. My SQL database contains over 443000 record about people, buildings, shops, and other. but each time I am trying to insert data into graph it takes 10 minutes for 1000 record. That will drive me crazy.

    10 minutes for 1000 records sounds extremely slow, even when your records are relatively large. However, it's not really possible to tell you what you're doing wrong without knowing the details of your setup and how you insert the data exactly. So I would suggest that you create a separate post for that in the JanusGraph users group where you describe your setup, the data you are inserting, and the configuration and also share the code that you are using to insert the data. (You could of course also post this as a SO question, but my feeling is that it is a bit too specific to your use case for SO.)

    In general you might want to check this blog post about optimizing write performance with JanusGraph: https://www.experoinc.com/post/janusgraph-nuts-and-bolts-part-1-write-performance

    1. I create some composite and mixed index before inserting data but when I insert data I can't create a new index (their properties keys stuck into In stalled) for that I am obligated to recreate the schema and insert the data again.

    It should always be possible to create a new index. It just requires to re-index already existing data so that this existing data can also be searched with the newly created index.

    Unfortunately, this re-indexing sometimes leads to problems which is currently one of the most frequently reported problems for JanusGraph from what I can see. For that reason, I would expect that this area will be improved with future versions of JanusGraph.

    When you have specific problems with the re-indexing then I suggest again that you create a separate mailing list post / SO question for that where you share more details. Otherwise it is really hard to help you. The same also applies to your third question.

    1. I am using Gremlin.Net to deal with janus but the problem is I can't use textContains to perform queries like select * fom People where firstname like 'janus'. So why I am using Index backend like Elasticsearch then ?

    That is definitely an important feature currently missing for .NET users of JanusGraph (which also includes myself) and probably also for users of other non-JVM based languages. I just started a discussion on the developer list of JanusGraph about how this support can be added for languages like Python, JavaScript, and for .NET in general.

    Any contributions are of course more than welcome!

    I don't know where is the mistake. the database technologies (janusgraph) [...]

    At least some of your problems can be attributed to the fact that these technologies are relatively new. JanusGraph had its first release about one year ago and the first official release of Gremlin.Net as part of Apache TinkerPop was only 4 months ago. So there are definitely features that are still missing and some things don't work as smoothly as you might expect. Getting constructive feedback and bug reports of course help to improve these projects and again, you can always contribute by submitting a pull request when a feature that is important for you is missing. In the end, both projects, JanusGraph and Apache TinkerPop to which Gremlin and Gremlin.Net belong are community driven projects.

    Now I don't know if I really answered your specific problems, especially questions 2 and 3 about problems with indexes. Maybe someone with more knowledge in this area can give you already a better answer. (Although more information would be definitely helpful and separate posts for the different problems you have also make it easier to help you.)