Search code examples
hadoophdfsbigdatahortonworks-data-platformapache-falcon

Falcon's role in Hadoop ecosystem


I am supposed to work on cluster mirroring where I have to set up the similar HDFS cluster (same master and slaves) as a existing one and copy the data to the new and then run the same jobs as is.

I have read about falcon as a feed processing and a work flow coordinating tool and it is used for mirroring of HDFS clusters as well. Can someone enlighten me on what is Falcon's role in Hadoop ecosystem and how does it help in mirroring in particular. I am looking here to understand what all facon offers when it is part of my Hadoop eco-system (HDP).


Solution

    • Apache Falcon simplifies the configuration of data motion with: replication; lifecycle management; lineage and traceability. This provides data governance consistency across Hadoop components.
    • Falcon replication is asynchronous with delta changes. Recovery is done by running a process and swapping the source and target.
    • Data loss – Delta data may be lost if the primary cluster is completely shut down
    • Backup can be scheduled when needed depending on the bandwidth and network availability.