I can understand that EFS can be mounted to multiple EC2 instances.
Is it possible to connect to AWS EFS from multiple Hadoop clusters?
Or is it attached to specific cluster?
Can we connect to EFS outside the Hadoop Clusters using API?
You are using a Cloudera distribution for your Hadoop cluster, so you can configure whatever you wish.
As a comparison, users of Amazon EMR (the AWS managed Hadoop service) normally choose from two types of storage:
For EMR (again, not your situation), users keep input and output data in Amazon S3 as a persistent data store. This way, data is not lost when the cluster is terminated. The benefit is that clusters can be turned off when they are not used (hence, saving money) and additional clusters can be spun-up when more processing power is required. This is not possible in a traditional on-premises setup where clusters are permanently kept on and cannot be scaled up or down.
So, back to your Cloudera cluster... You will probably be using HDFS for your storage, in which case you would want attached disk storage. You also have the option of using S3 for storage of data, which can work out cheaper than disk storage.
Yes, you could attach Amazon EFS volumes via NTFS, but EFS is normally used for sharing disks between EC2 instances and this is not the way that HDFS operates (it assumes locally-attached disks with the distributed sharing happening at the NodeManager level).
I would recommend investigating whether you could use Amazon EMR instead of deploying your own Hadoop cluster due to the benefits of scaling, transient clusters, automatic deployment and regular upgrades. If you must use Cloudera, you will be responsible for managing and maintaining the cluster yourself.