Search code examples
amazon-web-servicesnetwork-programmingcloudetl

Is an ETL instance supposed to be in private or public subnet?


I am currently working on a ETL tool at work (python & bash scripts managed with Airflow basically) and I am asking myself wether I should put my EC2 instance which will run the ETL in a public or private subnet. My instance should have acces to the internet to retrieve data (basically ssh through on-premises instances we have) and should also be able to be accesses through SSH.

However, I don't know if allowing outbound connection to the internet and restrict inbound connection to SSH is enough about security or if I should put the instance in a private subnet and tweaking things to be able to connect to it.


Solution

  • Your ETL instance should be in a private subnet behind a NAT instance. NAT gateway will give your EC2 private network internet connectivity but still ensure that your EC2 instances are not accessible from the internet. So in order to allow access to internet it has to route traffic through public network which has a Internet gateway attached. You should put your EC2 instance in a private subnet to prevent hackers from gaining access and stealing your data.

    You can learn how to setup NAT gateway here https://aws.amazon.com/premiumsupport/knowledge-center/nat-gateway-vpc-private-subnet/