Search code examples
filehadoopawkhdfs

hdfs how to output size zero file in a specific directory path


For example, I want to output all zero files path in a specific directory like hdfs://<DIRECTORY>.

-rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03767.pb.zstd
-rw-r--r--   3 USER supergroup      71667 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03768.pb.zstd
-rw-r--r--   3 USER supergroup      94330 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03769.pb.zstd
-rw-r--r--   3 USER supergroup      14756 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03770.pb.zstd
-rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03771.pb.zstd
// output
-rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03767.pb.zstd
-rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03771.pb.zstd

I want to use hdfs -ls or hdfs -du and awk, but I am not familiar with awk.
How to implement it.
Thanks in advances.


Solution

  • If the output of hdfs ls is reliable :

    $ hdfs ls | awk '$5 == 0'
    -rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03767.pb.zstd
    -rw-r--r--   3 USER supergroup          0 2022-10-23 21:52 hdfs://<DIRECTORY>/part-03771.pb.zstd
    

    This is one of the most simple awk command ;)