Hadoop works on WORM principle. The why does Hue let me edit the file? I created a file in HDFS (CDH) say employee.txt. I was under the impression that the employee.txt should not be editable according to WORM principle. But when I open the file using Hue -> Edit file, I can edit the existing content. What is the idea of WORM principle then?
This is because Hue does a:
This gets around the WORM principle.
The code at https://github.com/cloudera/hue/blob/master/desktop/libs/hadoop/src/hadoop/fs/fsutils.py:
def _do_overwrite(fs, path, copy_data):
"""
Atomically (best-effort) save the specified data to the given path
on the filesystem.
"""
# TODO(todd) Should probably do an advisory permissions check here to
# see if we're likely to fail (eg make sure we own the file
# and can write to the dir)
# First write somewhat-kinda-atomically to a staging file
# so that if we fail, we don't clobber the old one
path_dest = path + "._hue_new"
# Copy the data to destination
copy_data(path_dest)
# Try to match the permissions and ownership of the old file
cur_stats = fs.stats(path)
try:
fs.do_as_superuser(fs.chmod, path_dest, stat_module.S_IMODE(cur_stats['mode']))
except:
logging.exception("Could not chmod new file %s to match old file %s" % (path_dest, path))
# but not the end of the world - keep going
try:
fs.do_as_superuser(fs.chown, path_dest, cur_stats['user'], cur_stats['group'])
except:
logging.exception("Could not chown new file %s to match old file %s" % (path_dest, path))
# but not the end of the world - keep going
# Now delete the old - nothing we can do here to recover
fs.remove(path, skip_trash=True)
# Now move the new one into place
# If this fails, then we have no reason to assume
# we can do anything to recover, since we know the
# destination shouldn't already exist (we just deleted it above)
fs.rename(path_dest, path)