How to find the number of subdirectories in a specified directory in HDFS?
When I do hadoop fs -ls /mydir/
, I get a Java heap space error, since the directory is too big, but what I am interested in is the number of subdirectories in that directory. I tried:
gsamaras@gwta3000 ~]$ hadoop fs -find /mydir/ -maxdepth 1 -type d -print| wc -l
find: Unexpected argument: -maxdepth
0
I know that the directory is not empty, thus 0 is not correct:
[gsamaras@gwta3000 ~]$ hadoop fs -du -s -h /mydir
737.5 G /mydir
The command to use is:
hdfs dfs -ls -R /path/to/mydir/ | grep "^d" | wc -l
But this will also give you the error java.lang.OutOfMemoryError: Java heap space
. In order to avoid the error, you need to increase the java heap space and run the same command as:
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx5g"
and then
hdfs dfs -ls -R /path/to/mydir/ | grep "^d" | wc -l
.....#For all sub-directories
OR
hdfs dfs -ls /path/to/mydir/ | grep "^d" | wc -l
.....#For maxdepth=1