Search code examples
hadoophdfsreplication

Hadoop: Is it possible to avoid replication for certain files?


In hdfs as i understand all files are replicated, but we do certain logging in case of our jobs, the files of which we donot want replication as it might unnecessarily maintain replicated copies, is it possible to do so? i.e to avoid only log files being replicated.?


Solution

  • You can set replicaion by using -setrep flag along with hadoop fs shell command.

    Usage: hadoop fs -setrep [-R] [-w] <numReplicas> <path>
    
    Changes the replication factor of a file. If path is a directory then the command recursively changes the replication factor of all files under the directory tree rooted at path.
    
    Options:
    
    The -w flag requests that the command wait for the replication to complete. This can potentially take a very long time.
    The -R flag is accepted for backwards compatibility. It has no effect.
    Example:
    
    hadoop fs -setrep -w 3 /user/hadoop/dir1
    

    To avoid replication you can set numReplicas to 1.