Search code examples
apostrophe-cms

Scalability: what is the best way to handle a lot of comments on Apostrophe?


I'm working on a project in which the database will hopefully grow and I'm hesitating on how to design a comments system. I'm using gross estimates to plan and it would be awesome to reach 10% of the goal, but I'd like to be prepared if things starts to grow.

I'd like to implement user comments on widgets that are inside a custom piece. Each piece will have about hundreds of these widgets, each widget will possibly have hundreds of comments, and there will be thousands of pieces. The non-admin users won't be able to edit neither the piece or the widgets, just the comments.

The alternatives I thought are:

  1. Comments as widgets, inside the main-piece: would be very nice to not need to do custom queries to load the comments on the pages, but this would probably be very slow and heavy, edit the piece would be impossible, at least with chrome (imagine the editor modal loading 500 widgets, each with 200 comments.. 100,000 comments loaded at once!), and would be hard to search for comments in all pieces and widgets.
  2. Comments as a piece, associated to the main-piece using the _id fields: better, because would be possible to moderate comments one by one, with filters, and load them when the user click on a widget, with ajax, pagination and more. But, won't it crowd the aposDocs collection and affect other pieces? For instance, 10,000 main-pieces with 100,000 comments-pieces each = 1 billion comments-pieces. If each comment has 5kB, the total size would be ~ 4,66TB, which is ok for mongo, but is it ok for Apostrophe? Can I go for it?
  3. Comments in another collection: would be more organized and won't affect other pieces, but I would lost the ability to natively manage the comments with Apostrophe, rich text for comments... which is very sad. Is there an easy way to create a piece that stores the comments in another collection (e.g. commentsDoc), keeping the Apostrophe functionalities?

Which is the best? Is there other ways? Are there other important things to consider?


Solution

  • I understand that you're planning optimistically, which is why I'm responding pessimistically. (: Which is to say, I'm taking you at your word about the numbers and talking about the outcomes you'll experience.

    First, you're correct, 100,000 comments wouldn't fit in a MongoDB document (16MB limit) and the performance would be terrible.

    Second, while using a join in each widget to fetch the comments would theoretically work, you would in practice still be loading 100,000 comments on a typical pageview, which would exhaust resources both server-side and browser-side. And you would have SEO problems too.

    So, you'll need to think about this in another way. If you have hundreds of widgets in a single "document," that isn't really a single document. That is, users won't experience it that way. I don't have much information about your use case, but my guess is that each "document" is really more of a knowledge base and not every pageview will involve reading all of the hundreds of widgets.

    So, break these documents up. Each "widget" in your current design should be a piece in its own right. It should have a permalink — a URL at which users can share it and Google can index it. This is what you get with apostrophe-pieces-pages.

    But, if you want to present them in a "one-pager" design without traditional pagination, you can achieve that with good performance using the infinite scroll features of apostrophe-pieces-pages. This autoloads more content in the background if and when the user engages that long and scrolls (almost) that far. See reusable content with pieces for more on that subject.

    And, you can add filtering using the piecesFilters option to allow navigation among separate "meta-documents" (what you are currently thinking of as single pieces).

    Also, consider using Disqus for comments. It's free, and you don't have to implement moderation, etc. yourself. It would greatly simplify your implementation unless there's a compelling reason to self-host comments and worry about user signup and so forth.

    I think that's about as specific as it's possible to be without understanding your use case better.