Cloudera Impala: How does it read data from HDFS blocks?

I had a basic question in Impala. We know that Impala allows you to query data that is stored in HDFS. Now, if a file is split into multiple blocks, and let us say a line of text is spread across two blocks. In Hive/MapReduce, the RecordReader takes care of this.

How does Impala read the record in such a scenario?

Solution

Referencing my answer on the Impala user list:

When Impala finds an incomplete record (e.g. which can happen scanning certain file formats such as text or rc files), it will continue to read incrementally from the next block(s) until it has read the entire record. Note that this may require small amounts of 'remote reads' (reading from a remote datanode), but usually this is a very small amount compared to the entire block which should have been read locally (and ideally via a short circuit read).

Error: `callbackHandler` may not be null when connecting to HDFS using Kerberos in Jakarta EE
Missing PutHDFS Processor in Apache NiFi 2.0.0
Apache Nifi: PutHDFS Processor issue - PutHDFS Failed to write to HDFS java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configurable
how to check which HDFS datanode ip is returned by namenode to spark?
Why is metadata consuming large amount of storage and how to optimize it?
Writing SQL vs using Dataframe APIs in Spark SQL
Is there a way to directly insert data from a parquet file into PostgreSQL database?
How to compile and execute this JAVA application in Ubuntu?
Parquet without Hadoop?
Python read file as stream from HDFS
How to load data from CSV into an external table in impala
Hadoop HDFS - Difference between Missing replica and Under replicated blocks
how t restore a hdfs deleted file
Sqoop Import HBase - SQL Database
Spark - load CSV file as DataFrame?
Hadoop: How can i resolve the error "Could'n upload the file" in docker container
What should be hadoop.tmp.dir ?
incompatible cluster id between namenode and datanode for hadoop
hadoop/hdfs/name is in an inconsistent state: storage directory(hadoop/hdfs/data/) does not exist or is not accessible
How to copy file from HDFS to the local file system
How does awk work with directory of HDFS?
Where does Hive store its data?
How can I solve this problem : hadoop namenode -format gives error?
'HADOOP_COMMON_HOME' error while installing hadoop
How gzip file gets stored in HDFS
JAR does not exist or is not a normal file
Unable to run yarn during hadoop installation
How can I make HBase wait to start until HDFS is ready?
what is the difference between fsimage and snapshot in hadoop?
What do values from 100 to 200 mean in MOD10A1 NDSI snow cover layer?