Search code examples
hadoophdfs

Locking a directory in HDFS


Is there a way to acquire lock on a directory in HDFS? Here's what I am trying to do:

I've a directory called ../latest/...

Every day I need to add fresh data into this directory, but before I copy new data in here, I want to acquire lock so no one is using it while I copy new data into it.

Is there a way to do this in HDFS?


Solution

  • No, there is no way to do this through HDFS.

    In general, when I have this problem, I try to copy the data into a random temp location and then move the file once the copy is complete. This is nice because mv is pretty instantaneous, while copying takes longer. That way, if you check to see if anyone else is writing and then mv, the time period and "lock" is held for a shorter time

    1. Generate a random number
    2. Put the data into a new folder in hdfs://tmp/$randomnumber
    3. Check to see if the destination is OK (hadoop fs -ls perhaps)
    4. hadoop fs -mv the data to the latest directory.

    There is a slim chance that between 3 and 4 you might have someone clobber something. If that really makes you nervous, perhaps you can implement a simple lock in ZooKeeper. Curator can help you with that.