Search code examples
amazon-web-servicesamazon-ec2data-analysisamazon-vpc

What are the different use cases for AWS VPC in the area of Data Analytics?


I am new to AWS VPC and exploring everything about it. I understood that VPC is majorly used to have a secure and isolated environment. What are the different use cases for AWS VPC in the area of Data Analytics? I have a data lake pipeline currently which is as follows:

  1. Extract data using APIs
  2. Store raw data in S3
  3. Create Lambda functions or Glue Jobs to perform business metrics
  4. Store metric outputs in S3
  5. Create tables in Athena for all the data stored in S3
  6. Import tables in Quicksight to produce business insights from visuals

Solution

  • The services you mention (mostly) live outside of VPCs.

    VPCs are used for services that use virtual computers, such as Amazon EC2 computers and Amazon RDS databases.

    By using services that don't involve specific 'computers' (such as Amazon S3, Athena, QuickSight) you can take advantage of much lower costs, paying only what you use. These services do not mimic traditional servers and therefore don't need VPCs. All the networking complexity is hidden and you can concentrate on using the service instead of running a network.

    Yes, VPCs add extra security, but that's only because resources on a VPC need securing due to potential security holes. The services you mention are all secured via IAM and do not expose themselves outside the published APIs.