Search code examples
linuxhadoopapache-sparkhdfsredhat

Number of subdirectories in a directory?


How to find the number of subdirectories in a specified directory in HDFS?


When I do hadoop fs -ls /mydir/, I get a Java heap space error, since the directory is too big, but what I am interested in is the number of subdirectories in that directory. I tried:

gsamaras@gwta3000 ~]$ hadoop fs -find /mydir/ -maxdepth 1 -type d -print| wc -l
find: Unexpected argument: -maxdepth
0

I know that the directory is not empty, thus 0 is not correct:

[gsamaras@gwta3000 ~]$ hadoop fs -du -s -h /mydir
737.5 G  /mydir

Solution

  • The command to use is: hdfs dfs -ls -R /path/to/mydir/ | grep "^d" | wc -l

    But this will also give you the error java.lang.OutOfMemoryError: Java heap space. In order to avoid the error, you need to increase the java heap space and run the same command as:

    export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx5g" and then

    hdfs dfs -ls -R /path/to/mydir/ | grep "^d" | wc -l .....#For all sub-directories

    OR

    hdfs dfs -ls /path/to/mydir/ | grep "^d" | wc -l .....#For maxdepth=1