Search code examples
hadoophue

WORM principle of Hadoop..What it exactly means?


Hadoop works on WORM principle. The why does Hue let me edit the file? I created a file in HDFS (CDH) say employee.txt. I was under the impression that the employee.txt should not be editable according to WORM principle. But when I open the file using Hue -> Edit file, I can edit the existing content. What is the idea of WORM principle then?


Solution

  • This is because Hue does a:

    1. write contents to tempfile
    2. remove old file
    3. rename tempfile to file

    This gets around the WORM principle.

    The code at https://github.com/cloudera/hue/blob/master/desktop/libs/hadoop/src/hadoop/fs/fsutils.py:

    def _do_overwrite(fs, path, copy_data):
        """
        Atomically (best-effort) save the specified data to the given path
        on the filesystem.
        """
        # TODO(todd) Should probably do an advisory permissions check here to
        # see if we're likely to fail (eg make sure we own the file
        # and can write to the dir)
    
        # First write somewhat-kinda-atomically to a staging file
        # so that if we fail, we don't clobber the old one
        path_dest = path + "._hue_new"
    
        # Copy the data to destination
        copy_data(path_dest)
    
        # Try to match the permissions and ownership of the old file
        cur_stats = fs.stats(path)
        try:
            fs.do_as_superuser(fs.chmod, path_dest, stat_module.S_IMODE(cur_stats['mode']))
        except:
            logging.exception("Could not chmod new file %s to match old file %s" % (path_dest, path))
            # but not the end of the world - keep going
    
        try:
            fs.do_as_superuser(fs.chown, path_dest, cur_stats['user'], cur_stats['group'])
        except:
            logging.exception("Could not chown new file %s to match old file %s" % (path_dest, path))
            # but not the end of the world - keep going
    
        # Now delete the old - nothing we can do here to recover
        fs.remove(path, skip_trash=True)
    
        # Now move the new one into place
        # If this fails, then we have no reason to assume
        # we can do anything to recover, since we know the
        # destination shouldn't already exist (we just deleted it above)
        fs.rename(path_dest, path)