I need to build a data lake on AWS, but I don't know how exactly S3 is different from HDFS. I found some answers in the Internet but I still don't understand the real difference.
I also need to know if someone has the data lake architecture of HDFS and S3 in AWS.
HDFS is only accessible to the Hadoop cluster in which it exists. If the cluster turns off or is terminated, the data in HDFS will be gone.
Data in Amazon S3: