I'm doing a bit of research on ZFS, if it could be used as a component of a distributed processing framework. The main question I'm trying to answer is - will Apache Spark run in an efficient, distributed fashion if the data is residing in zfs?
ie. Spark on HDFS has the concept of data locality, can the same be said of ZFS.
Can it be run with ZFS as a local file system? By all means. ZFS is POSIX compliant so there are no blocker here.
Can it be used as a replacement for distributed file system? Definitely not. ZFS is file system and volume manager, not distributed storage.