Search code examples
hadoophdfshortonworks-data-platform

Cluster hosts have more storage space than HDFS seems to recognize / have access to? How to increase HDFS storage use?


Having problem where HDFS (HDP v3.1.0) is running out of storage space (which is also causing problems with spark jobs hanging in ACCEPTED mode). I assume that there is some configuration where I can have HDFS use more of the storage space already present on the node hosts, but exactly what was not clear from quick googling. Can anyone with more experience help with this?

In Ambari UI, I see... enter image description here (from ambari UI) enter image description here (from NameNode UI).

Yet when looking at the overall hosts via ambari UI, there appears to be still a good amount of space left on the cluster hosts (the last 4 nodes in this list are the data nodes and each has a total of 140GB of storage space) enter image description here

Not sure what setting are relevant, but here are the general setting in ambari: enter image description here My interpretation of the "Reserved Space for HDFS" setting is that it shows there should be 13GB reserved for non-DFS (ie. local FS) storage, so does not seem to make sense that HDFS is already running out of space. Am I interpreting this wrongly? Any other HDFS configs that should be shown in this question?

Looking at the disk usage by HDFS, I see...

[hdfs@HW001 root]$ hdfs dfs -du -h /
1.3 G    4.0 G    /app-logs
3.7 M    2.3 G    /apps
0        0        /ats
899.1 M  2.6 G    /atsv2
0        0        /datalake
39.9 G   119.6 G  /etl
1.7 G    5.2 G    /hdp
0        0        /mapred
92.8 M   278.5 M  /mr-history
19.5 G   60.4 G   /ranger
4.4 K    13.1 K   /services
11.3 G   34.0 G   /spark2-history
1.8 M    5.4 M    /tmp
4.3 G    42.2 G   /user
0        0        /warehouse

for a total of ~269GB consumed (perhaps setting a shorter interval to spark-history cleanup would help as well?). Looking at the free space on HDFS, I see...

[hdfs@HW001 root]$ hdfs dfs -df -h /
Filesystem                        Size     Used  Available  Use%
hdfs://hw001.ucera.local:8020  353.3 G  244.1 G     31.5 G   69%

Yet ambari reports 91% capacity, so this seems odd to me (unless I am misinterpreting something here (LMK)). This also conflicts with what I see broadly when looking at the disk space on the local FS where the hdfs datanode dirs are located...

[root@HW001 ~]# clush -ab -x airflowet df -h /hadoop/hdfs/data
HW001: df: ‘/hadoop/hdfs/data’: No such file or directory
airflowetl: df: ‘/hadoop/hdfs/data’: No such file or directory
---------------
HW002
---------------
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/centos_mapr001-root  101G   93G  8.0G  93% /
---------------
HW003
---------------
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/centos_mapr001-root  101G   94G  7.6G  93% /
---------------
HW004
---------------
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/centos_mapr001-root  101G   92G  9.2G  91% /
---------------
HW005
---------------
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/centos_mapr001-root  101G   92G  9.8G  91% /

Looking at the block report for the hdfs root...

[hdfs@HW001 root]$ hdfs fsck / -files -blocks
.
.
.
Status: HEALTHY
 Number of data-nodes:  4
 Number of racks:               1
 Total dirs:                    8734
 Total symlinks:                0

Replicated Blocks:
 Total size:    84897192381 B (Total open files size: 10582 B)
 Total files:   43820 (Files currently being written: 10)
 Total blocks (validated):      42990 (avg. block size 1974812 B) (Total open file blocks (not validated): 8)
 Minimally replicated blocks:   42990 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       1937 (4.505699 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.045057
 Missing blocks:                0
 Corrupt blocks:                0
 Missing replicas:              11597 (8.138018 %)

Erasure Coded Block Groups:
 Total size:    0 B
 Total files:   0
 Total block groups (validated):        0
 Minimally erasure-coded block groups:  0
 Over-erasure-coded block groups:       0
 Under-erasure-coded block groups:      0
 Unsatisfactory placement block groups: 0
 Average block group size:      0.0
 Missing block groups:          0
 Corrupt block groups:          0
 Missing internal blocks:       0
FSCK ended at Tue May 26 12:10:43 HST 2020 in 1717 milliseconds


The filesystem under path '/' is HEALTHY

I assume that there is some configuration where I can have HDFS use more of the storage space already present on the node hosts, but exactly what was not clear from quick googling. Can anyone with more experience help with this? Also if anyone could LMK if this may be due to other problems I am not seeing?


Solution

  • You haven't mentioned if there is crappy data in /tmp for example that could be cleaned.

    Each datanode has 88.33 GB of storage?

    If so, you cannot just create new HDDs to become attached to the cluster and suddenly create space.

    dfs.data.dir in hdfs-site is a comma-separated list of mounted volumes on each datanode.

    To get more storage, you need to format and mount more disks, then edit that property.