I have imported the data from SQl Server to HDFS. The data stored in HDFS directory in a multiple files as:
part-m-00000
part-m-00001
part-m-00002
part-m-00003
My question is that While reading this stored data from HDFS directory we have to read all file (part-m-00000,01,02,03
) or just part-m-00000
. Because when I read that data, I found that the data inside HDFS is little bit missing.
So, is it happens or something I missed out?
You need to read all the files, not just 00000. The reason there are multiple files is that sqoop works in a map-reduce fashion, splitting the "import" work to multiple parts. The output from each part is put in a separate file.
RL