Search code examples
neo4jtwitter4jrelationships

Issue in counting relationships for nodes in Neo4j


I've got a Neo4j database in which hashtags and tweets are stored. Every tweet has a topic property, which defines the topic it belongs to. If I run the following query, I get the most popular hashtags in the db, no matter the topic:

MATCH (h:Hashtag)
RETURN h.text AS hashtag, size( (h)<--() ) AS degree ORDER BY degree DESC

I'd like to get the most popular tags for a single topic. I tried this:

MATCH (h:Hashtag)<--(t:Tweet{topic:'test'})
RETURN h.text AS hashtag, size( (h)<--(t) ) AS degree ORDER BY degree DESC

this

MATCH (h:Hashtag)
RETURN h.text AS hashtag, size( (h)<--(t:Tweet{topic:'test'}) ) AS degree ORDER BY degree DESC

while the next one takes forever to run

MATCH (h:Hashtag), (t:Tweet)
WHERE t.topic='test'
RETURN h.text AS hashtag, size( (h)<--(t) ) AS degree ORDER BY degree DESC

What should I do? Thanks.


Solution

  • In Cypher, when you return the results of an aggregation function you get an implicit "group by" with whatever you are returning alongside the aggregation function. SIZE() is not an aggregation (so you'll get the size of the pattern for each row without the group by/aggregation), but COUNT() is:

    MATCH (t:Tweet {topic:'test'})-->(h:Hashtag)
    RETURN h, COUNT(*) AS num ORDER BY num DESC LIMIT 10
    

    This query is counts of Tweet nodes, grouped by Hashtag.