We have a hybrid model with on-premise connected to AWS via a site-to-site VPN. There is a need to download data from s3 to on-premise in a way that the traffic will go from on-premise to AWS and back without going to the open Internet for security considerations. I.e. similar to this:
on-prem --VPN--> AWS private subnet --> s3 endpoint --> s3
This schema works with interface endpoints since they generate private DNS names which can be used to call from on-premise, but the s3 endpoint is a gateway endpoint, not an interface endpoint, so it doesn't generate private DNS names.
How can this be achieved?
In February 2021, AWS released S3 PrivateLink Interface Endpoints which are different to the S3 Gateway Endpoints.
The difference is that S3 Interface Endpoints resolve to private VPC IP addresses and are routeable from outside the VPC (e.g via VPN, Direct Connect, Transit Gateway etc). S3 Gateway Endpoints use public IP ranges and are only routeable from resources within the VPC.
Interface Endpoints mean you can route to S3 buckets from your on-premise network via your VPN and one or more subnets without needing a proxy in the VPC, and without traversing the public Internet.
Refer to the blog announcement and the S3 privatelink user guide for more details.