Search code examples
amazon-web-servicescdnamazon-cloudfrontcname

Does using CNAME with Amazon CloudFront cause extra charge?


I created a CloudFront distribution for one of my sites the other day and I have been using custom CNAME (cdn.site.com instead of whoa123.cloudfront.net). What I can't figure out is, is there any charge for using this service (The custom CNAME)? I couldn't find anywhere whether using a CNAME causes extra charge. Will I be billed extra for this or will using it cause additional HTTP request/hits against my quota/billing?

I'm trying to understand if this custom CNAME service is free with CloudFront distribution, then why many of the major websites I'm seeing (including dropbox.com ) are not using a custom CNAME and sticking to the default cloudfront.net URL.

Thanks for any help you can provide.


Solution

  • As far Amazon documentation says, CNAME is free.

    Now your actual confusion is, why services like Dropbox or others do not use Custom CNAME?

    The Answer is lookup resource. Each time someone ( A browser perhaps ) look for the file, it makes a lookup to find it's host. Then it finds that its hosted in Cloudfront. Then retrieves the file. Now check the process carefully,

    first yoursite.com lookup > Cloudfront Lookup > file.

    It's a simple process for a small site. But guess what happen when you run a site which have more than a million hit per day / hour. Same process running again and again. Which consumes whole lot of resources. And sometimes, this much lookup comes with extra charges. Thus, they cut one lookup from the equation. Now it's like,

    Cloudfront Lookup > file.

    Saves a lot of Time and Resources. For a small site, like below 100 Thousand hit a day, a custom CNAME won't be fatal.

    How the Lookup Works

    Lookup is a simple and the same time complex process. Let's explain this in a scenario,

    You types google.com in your browser and pressed enter. In this scenario, DNS cache doesn't exist for the sake of explaining.

    Your browser asks your OS to find google.com. Your OS has a DNS Resolver listed already. Usually comes from your ISP, sometimes you use third party DNS resolver like Google DNS. Once the DNS resolver got the request from the OS, it hand over the request to Root DNS server. There are 13 DNS root server all over the globe with 360 nodes.

    When root server gets the request, first it checks for TLDs. In this case, it's a .com domain. Root server knows the location of the handler of .com domain, so it sends back an answer with the location of .com's handler. Your resolver then sends a request to .com's handlers to see it it know the location of google.com. Since .com is a gTLD ( Generic ), it's maintained by a commercial entity. This commercial entity has a list of every .com domain ever existed and their name servers location. They send the name servers location of google.com to your resolver.

    Here, its ns1.google.com, ns2.google.com, ns3.google.com, ns4.google.com. Now your resolver asks the Google name server for google.com's IP address. Now you have a definitive name server that actually have the IP address of google.com. It sends the IP.

    Your resolver sends that IP back to your OS and your OS to your browser. Then the TCP handshake starts. Your browser runs the IP, a Connection made between your Computer and the IP's Physical server. It sends back HTML and your browser interprets them for you.

    Now, in most cases, this process is lengthy and risky. So in each step, there is a cache list. Your resolver already has the list of the IP's of frequently visited domains. So it resolves the IP immediately.

    The problem with your case, Your resolver probably cached cdn.yoursite.com and directly calling the IP. But guess what happens when you are standing with millions of visitor. There is a chance that 15% users are visiting when their DNS resolver is updating its cache. That means , those 15% user's request have to follow the procedure I explained to get to your site. Now they have to look for cdn.yoursite.com for the first time, then when it comes back with CloudFront's host, the same process runs again for cloud fronts IP. Huge loss of time and Resources, because each lookup opens up 4 layers of database which cost memory and processing power of the Physical CPU.

    Then, Some DNS resolver only cache one layer of IP. CNAME is actually a host declaration, it tells the resolver that this particular CNAME points to somewhere else. That somewhere else is another request. So it also runs two querys on Resolver's cache. And if there isn't any cache, 4 layers of pain coming inwards.

    Hope i explained it correctly. My head is already spinning after writing this !