Search code examples
pythonhadoopfabric

Check if a file exists in HDFS from Python


So, I've been using the fabric package in Python to run shell scripts for various HDFS tasks.

However, whenever I run tasks to check if a file / directory already exists in HDFS, it simply quits the shell. Here is an example (I am using Python 3.5.2 and Fabric3==1.12.post1)

from fabric.api import local


local('hadoop fs -stat hdfs://some/nonexistent/hdfs/dir/')

If the directory does not exist, this code yields

[localhost] local: hadoop fs -stat hdfs://some/nonexistent/hdfs/dir/ stat: `hdfs://some/nonexistent/hdfs/dir/': No such file or directory

Fatal error: local() encountered an error (return code 1) while executing 'hadoop fs -stat hdfs://some/nonexistent/hdfs/dir/'

Aborting.

I also tried local('hadoop fs -test -e hdfs://some/nonexistent/hdfs/dir/') but it caused the same issue.

How can I use fabric to generate a boolean variable that will tell me whether or not a directory or file exists in hdfs?


Solution

  • You can just check the succeeded flag of the result object returned from local.

    from fabric.api import local
    from fabric.context_managers import settings
    
    file_exists = False
    with settings(warn_only=True):
        result = local('hadoop fs -stat hdfs://some/nonexistent/hdfs/dir/', capture=True)
        file_exists = result.succeeded