Search code examples
hadoophdfsdistributed-filesystemalluxio

Alluxio with/without HDFS


I have a cluster with HDFS as an under storage distributed file system, but I've just read about alluxio that is fast and flexible. So, My question is: Should I use Alluxio with HDFS or Alluxio is alternative for HDFS? (I see in their site that shared storage for under storage file system can be network file system (NFS). So, I think HDFS is not required. Correct me if I make a mistake).

In which mode performance is better: HDFS with Alluxio or Alluxio stanalone (what I mean the term standalone is to be used alone in the cluster and not locally).


Solution

  • Reply from Alluxio maintainer.

    First of all, Alluxio is not a replacement for HDFS. Instead, it is a new abstraction layer on top of other distributed/cloud storage systems including HDFS, S3, Azure Object Store and other possible choices. In your case, if you data is already in HDFS, you will perhaps still keep HDFS as the persistent data layer for Alluxio.

    The typical scenarios users put Alluxio in the picture and see significant benefits include:

    • Your physical data is not located with your compute. E.g., your bigdata engine is reading data from S3 or other object storage. In this case, by deploying Alluxio with compute nodes, one can make Alluxio work as a filesystem level cache to avoid fetching data across network repeatedly. See http://www.alluxio.org/overview/remote-data-acceleration
    • You are managing multiple storages and want to expose a single data access layer to simplify the management. E.g., one can "mount" multiple S3/ buckets into one Alluxio deployment so they appear as different directories under the same namespace. See http://www.alluxio.org/overview/storage-unification

    Regarding your original performance question. The answer is, it depends. If your HDFS is remote from compute, you would expect a good performance gain. I also saw cases when HDFS is bottlenecked, Alluxio may also help to reduce the load and provides good SLA for certain mission-critical jobs.