I have an application which streams Twitter data which are stored in a Neo4j database. The data I store regard tweets, users, hashtag and their relationships (user posts tweet, tweet tags hashtags, user retweets tweet). Now, each time I get a new tweet what I do is:
And so on, same process for saving the relationships.
Here are the queries:
static String cqlAddTweet = "merge (n:Tweet{tweet_id: {2}}) on create set n.text={1}, n.location={3}, n.likecount={4}, n.retweetcount={5}, n.topic={6}, n.created_at={7} on match set n.likecount={4}, n.retweetcount={5}";
static String cqlAddHT = "merge (n:Hashtag{text:{1}})";
static String cqlHTToTweet = "match (n:Tweet),(m:Hashtag) where n.tweet_id={1} and m.text={2} merge (n)-[:TAGS]->(m)";
static String cqlAddUser = "merge (n:User{user_id:{3}}) on create set n.name={1}, n.username={2}, n.followers={4}, n.following={5}, n.profilePic={6} on match set n.name={1}, n.username={2}, n.followers={4}, n.following={5}, n.profilePic={6}";
static String cqlUserToTweet = "match (n:User),(m:Tweet) where m.tweet_id={2} and n.user_id={1} merge (n)-[:POSTS]->(m)";
static String cqlUserRetweets = "match (n:Tweet{tweet_id:{1}}), (u:User{user_id:{2}}) create (u)-[:RETWEETS]->(n)";
Since it is very slow in saving data, I suppose that this system can have better performances if I didn't run all those queries which scan the data each time.
Do you have any suggestion to improve my application?
Thank you and excuse me in advance if this may seem silly.
Make sure you have indexes (or uniqueness constraints, if appropriate) on the following label/property pairs. That will allow your queries to avoid scanning through all nodes with the same label (when starting a query).
:Tweet(tweet_id)
:Hashtag(text)
:User(user_id)
By the way, a couple of your queries can be simplified (but this should not affect the performance):
static String cqlAddTweet = "MERGE (n:Tweet{tweet_id: {2}}) ON CREATE SET n.text={1}, n.location={3}, n.topic={6}, n.created_at={7} SET n.likecount={4}, n.retweetcount={5}";
static String cqlAddUser = "MERGE (n:User{user_id:{3}}) SET n.name={1}, n.username={2}, n.followers={4}, n.following={5}, n.profilePic={6}";