Search code examples
hadoopsqoop

Query related to sqoop-import?


Scenario:

I have imported the data from SQl Server to HDFS. The data stored in HDFS directory in a multiple files as:

part-m-00000
part-m-00001
part-m-00002
part-m-00003

Question:

My question is that While reading this stored data from HDFS directory we have to read all file (part-m-00000,01,02,03) or just part-m-00000. Because when I read that data, I found that the data inside HDFS is little bit missing. So, is it happens or something I missed out?


Solution

  • You need to read all the files, not just 00000. The reason there are multiple files is that sqoop works in a map-reduce fashion, splitting the "import" work to multiple parts. The output from each part is put in a separate file.

    RL