Search code examples
hadoophdfsmetadatahadoop2namenode

Clarity of terms used in HDFS?


I have come across several terms while getting familiar to HDFS. Few of the terms are like: namespace, metadata, transaction logs, fsimage, editlogs.

Sometimes it appears that all these terms describe same thing, which is "having some information", but i am not clear on this.

In general metadata means data about data so would metadata refer to all these terms or all these terms have different purpose in context of Hadoop HDFS?


Solution

  • Namepace : Within hadoop 'namespace' refers to the file names with their paths maintained by a name node.

    Metadata : it includes the name of the file, size, permissions etc... This metadata is stored in a file called fsimage.

    fsimage : complete state of the hdfs file system at a point of time.

    Any changes done to the filesystem are not written to fsimage immediately but there are stored in a separate file (on the same location where fsimage is stored) called editlog.

    editlogs: It is a log that lists each file system change that was made after the most recent fsimage.

    enter image description here