Search code examples
hadoophdfshadoop2

How files or directories are getting stored in hadoop hdfs


I have created a file in hdfs using below command

hdfs dfs -touchz /hadoop/dir1/file1.txt

I could see the created file by using below command

hdfs dfs -ls /hadoop/dir1/

But, I could not find the location itself by using linux commands (using find or locate). I searched on internet and found following link. How to access files in Hadoop HDFS? . It says, hdfs is virtual storage. In that case, How its taking partition which one or how much it needs to be used, where the meta data being stored

Is it taking datanode location for virtual storage which I have mentioned in hdfs-site.xml to store all the data?

I looked into datanode location and there are files available. But I could not find out anything related to my file or folder which I have created.

(I am using hadoop 2.6.0)


Solution

  • HDFS file system is a distributed storage system wherein the storage location is virtual and created using the disk space from all the DataNodes. While installing hadoop, you must have specified paths for dfs.namenode.name.dir and dfs.datanode.data.dir. These are the locations at which all the HDFS related files are stored on individual nodes.

    While storing the data onto HDFS, it is stored as blocks of a specified size (default 128MB in Hadoop 2.X). When you use hdfs dfs commands you will see the complete files but internally HDFS stores these files as blocks. If you check the above mentioned paths on your local file system, you will see a bunch of files which correcpond to files on your HDFS. But again, you will not see them as actual files as they are split into blocks.

    Check below mentioned command's output to get more details on how much space from each DataNode is used to create the virtual HDFS storage.

    hdfs dfsadmin -report #Or

    sudo -u hdfs hdfs dfsadmin -report

    HTH