I want to find related posts by finding posts with the highest number of tags in common with the post in question. I can do this in a SQL query, but since I'll have thousands of posts with dozens of tags each I was wondering if CouchDB would be a better option? If so, how would it work in CouchDB?
Thinking about this problem relationally, here is the structure:
post: id, (other post data)
tag: id, (other tag data)
post_tag: post_id, tag_id
SQL to find the top 10 most related posts for post #X:
select r.post_id
from post_tag p join post_tag r on p.tag_id = r.tag_id
where p.post_id = X and r.post_id != X
group by r.post_id
order by count(*) desc
limit 10
This query gives the top 10 posts with the most tags in common with post X.
Can I get the same results from CouchDB more efficiently? How?
This query gives the top 10 posts with the most tags in common with post X.
This kind of queries is horrible to compute with Map/Reduce, since, for every document, you would have to generate a key for every combination of tags. Moreover, you would need several chained map/reduces to achieve that efficiently. So CouchDB is definitely not the easy way to do that.
As Marcin wrote, you could "cheat" and use a search server over CouchDB, but your SQL query seems to be more elegant.