Search code examples
linuxhadoopfilesystemshdfsmicrosoft-distributed-file-system

DFS Used%: 100.00% Slave VMs down in Hadoop


My slave VMs went down and I guess it's because DFS used is 100%. Can you please give a systematic approach as how to solve this problem? Is it a firewall problem? capacity problem or what could cause it and how can be fixed?

ubuntu@anmol-vm1-new:~$  hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

15/12/13 22:25:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 845446217728 (787.38 GB)
Present Capacity: 797579996211 (742.80 GB)
DFS Remaining: 794296401920 (739.75 GB)
DFS Used: 3283594291 (3.06 GB)
DFS Used%: 0.41%
Under replicated blocks: 1564
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (4 total, 2 dead)

Live datanodes:
Name: 10.0.1.190:50010 (anmol-vm1-new)
Hostname: anmol-vm1-new
Decommission Status : Normal
Configured Capacity: 422723108864 (393.69 GB)
DFS Used: 1641142625 (1.53 GB)
Non DFS Used: 25955075743 (24.17 GB)
DFS Remaining: 395126890496 (367.99 GB)
DFS Used%: 0.39%
DFS Remaining%: 93.47%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Sun Dec 13 22:25:51 UTC 2015


Name: 10.0.1.193:50010 (anmol-vm4-new)
Hostname: anmol-vm4-new
Decommission Status : Normal
Configured Capacity: 422723108864 (393.69 GB)
DFS Used: 1642451666 (1.53 GB)
Non DFS Used: 21911145774 (20.41 GB)
DFS Remaining: 399169511424 (371.76 GB)
DFS Used%: 0.39%
DFS Remaining%: 94.43%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Sun Dec 13 22:25:51 UTC 2015


Dead datanodes:
Name: 10.0.1.191:50010 (anmol-vm2-new)
Hostname: anmol-vm2-new
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Sun Dec 13 21:20:12 UTC 2015


Name: 10.0.1.192:50010 (anmol-vm3-new)
Hostname: anmol-vm3-new
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Sun Dec 13 22:09:27 UTC 2015

Solution

  • In VM there is only one file system. Login as root

    1. df -sh (one of the mount points will show ~100%)
    2. du -sh / (it will list size of each of the directory)
    3. If any directory other than your namenode and datanode directories taking up too much space, you can start cleaning up
    4. Also you can run hadoop fs -du -s -h /user/hadoop (to see usage of the directories)
    5. Identify all the unnecessary directories and start cleaning up by running hadoop fs -rm -R /user/hadoop/raw_data (-rm is to delete -R is to delete recursively, be careful while using -R).
    6. Run hadoop fs -expunge (to clean up the trash immediately, some times you need to run multiple times)
    7. Run hadoop fs -du -s -h / (it will give you hdfs usage of the entire file system or you can run dfsadmin -report as well - to confirm whether storage is reclaimed)