Search code examples
azurehadoopazure-hdinsight

Does Azure Blob Storage with HDInsight split files on complete lines?


If I use ASV to store files for HDInsight, then write MapReduce functions, does the system handle splitting those files neatly on complete lines of data when they are split out for the cluster to process? Is anything special needed to make sure a line of data in the file doesn't span the boundary of a file block and become unreadable because part of it is delivered to one data node and part to another?

If so, how does it do this?


Solution

  • I located the answer elsewhere, and it's yes, the HDInsight system has a step in reading from the distributed file system that will negotiate the end of complete lines in the files for each fragment.