Search code examples
hadoophdfshadoop2

What is best way to see data format in hadoop hdfs?


Loaded nearly 50GB of CSV file into Hadoop cluster and I want to see some sample records for identifying the columns.

I have tried using

hadoop fs -cat employees.csv | head -n 10

My questions are

  1. is that right command to see the data ?
  2. head -n 10 - it will load 50 GB data and it will do filter first 10 lines? how it is working ?
  3. any other better approach?

Solution

  • This depends on your version.

    For older Hadoop (< 3.1.0) versions:

    hadoop fs -cat employees.csv | head -n 10
    

    For newer (>= 3.1.0) Hadoop versions

     hadoop fs -head employees.csv