Search code examples
hadoophdfsgreenplum

Does Greenplum PXF support HDFS short circuit read?


I wonder if Greenplum PXF can take advantage of HDFS short circuit read when we place pxf and datanode on the same host. We did a prelimiary test, however, it seems that pxf does not leverage the short circuit read. There is almost nothing after googling, so we are not sure if we miss something. We use Greenplum 6.4 (community version), pxf 5.11.2 and CDH 6.3.

Any references, suggestions or comments are very appreciated.


Solution

  • The old version of PXF with hawq actually resides with data nodes and utilizes short-circuit read. THe current PXF has changed to reside with Greenplum segment hosts and acts like a hdfs client. I think you can tweak pxf source codes and setup pxf on datanodes with short-circuit read. However, you speed up the hdfs<->pxf communication, but slow down pxf<->greenplum segment communication.