Search code examples
amazon-web-servicesamazon-cloudfront

Should I use S3 or S3 + Clouldfront for a photo sharing website


I'm building a photo sharing application and am trying to figure out whether I should be using s3 or s3 + cloudfront to host my users photos. Since my photos that my users are sharing probably won't be accessed a whole lot (probably about 5 times on average), is cloudfront a wise choice? Also, when cloudfront receives a visitor, does that result in put/get results on my s3 bucket?

Please correct me if I have this wrong:

The pricing for using cloudfront + s3 is: S3 storage + Cloudfront bandwith.


Solution

  • You pay a price per request that hits Cloudfront, bandwidth from Cloudfront to the browser, and bandwidth (at a reduced rate) between Cloudfront and S3. You would also pay for each GET from S3 when Cloudfront doesn't have a copy of the object at the edge location where the request arrived.

    Cloudfront has dozens of systems around the globe and they route requests to those systems using location-based DNS services. When you request an object from cloudfront, your request goes to the (theoretically) best location based on where your IP address suggests you're located, depending on which cloudfront pricing package the distribution is configured with (you can choose not to ever send requests to higher-cost locations, which makes them slower for your users but lower cost for you).

    For each location in cloudfront, if an object gets requested from there, and the location doesn't have it cached due to a previous request, it has to be fetched from S3. Cloudfront is not a predictive monolithic entity -- an object appears to have to be requested from one of its individual system locations before that system has a copy of the object. There are more cloudfront edge locations than there are AWS regions... for example, there are Cloudfront edge locations in South Bend, Indiana, Atlanta, Dallas, and St. Louis, but a request that routes through St. Louis doesn't mean a copy of your object will also be cached in South Bend, until a request for the same object arrives there.

    For a site where each image is requested a small number of times, cloudfront doesn't make a lot of sense, since the odds of the image being cached is small, and that's the point of cloudfront -- caching objects geographically closer to the end-user. If the object isn't there, it's no faster and potentially a little slower because the object has to be fetched from S3 by cloudfront and then served back to the browser... so you're paying extra for something that doesn't give you much in that case.

    Cloudfront would just be "GET" against your bucket. You would still "PUT" objects directly into S3.

    If your users are USA-based, the "US-Standard" region of S3 geographically routes requests to servers in the eastern or western US depending on the user's apparent location. Buckets in other regions serve up all requests from servers in that region only.

    If your users are global, your system could dynamically select the most location-appropriate bucket from one you provision in each region, based on the user's registered location, and store that user's images there, on the theory that the majority of viewers would be from the same general area worldwide or that if not, then users elsewhere in the world would not be inconvenienced by the additional page load times required by fetching images from another continent, if those images were shared by someone in a remote location.