Search code examples
web-applicationshardware-infrastructure

How does a typical medium to large web application "grow up" infrastructure-wise?


The skinny: When Facebook / Twitter / Youtube (whatever) went from basic idea in software to... bigger (maybe 100,000 users?), how did they grow?

Is there a "best practices" growth path for medium sized web applications?

The real question: When specifying or bidding on a medium sized web application project, what are the biggies? In this case in question, we will use a PHP framework, but it seems that these would mostly generalize to any language.

So the programmers for the core application are (to me) the most obvious part. We get the user management, user interface, and special classes made to handle the application. However this seems to me to be less than half of the real project.

Ultimately, with good growth, infrastructure and meta-UI issues will be your main focus, right?

1) Infrastructure: cloud application space, data storage, db synchronization for multi-datacenter situations.

2) Language and Cultural issues: Making an app seem "likeable" or at least useable in the major "culture markets"

3) data indexing issues

4) API / interoperability issues (both embedded apps ala facebook and external access for data both for end users and to major players like search engines, etc.)

...so, I am so sure I am missing about half of them, and I have little idea how they prioritize.

The accepted answer here is a pretty good starting point for the answer I seek.


Solution

  • Expanding on item number one from your list would seem to be pretty key, and will help you decide what kind of scaling issues you are going to face, and even illude to what classes of technology might be useful. Doing so will also touch on items 3 and 4, as those are kind of interrelated.

    Below is a large mess of questions that might help you get in the exploratory mind set to expand on your scalability thoughts, its not a direct answer to your question but hopefully a starting point:

    Whats are your features in the app like? Are they read heavy or write heavy, or maybe both? When you are viewing data does it need to be the realtime newest possible state all the time? OR can it be delayed? How far can it be delayed? Can caching help? Caching is the easy part, also think of how to EXPIRE the caches, thats where the hard stuff is. What does the data you are working with look like? Is it highly relational, or more like separate documents?

    What are your performance requirements? - Does the app mostly generate reports in the background and email them out? Or does it need to display a perfect realtime map of all the current tweets off of twitter as they happen? Do updates from users need to immediately propagate to other users? Every user or just a subset of them? How fast does it have to do that? Does a page need to load in under 300ms or under 2s?

    Do external services have limitations of any kind, maybe request limits, or rate limits? That means you'll need to queue and batch up requests. Are some of your external data providers sending you data faster than you can process it? You might need to queue it up, you might need to make this part of the system scalable on its own, with a variable number of "workers" that can move up and down. Consider applying that "individually scalable" principal to other parts of the system, it pays dividends in large installations.

    Hope this helps somewhat :)