Search code examples

how to check which HDFS datanode ip is returned by namenode to spark?

If I'm reading/writing a dataframe in PySpark specifying HDFS name node hostname and port:

 df.write.parquet("hdfs://namenode:8020/test/go", mode="overwrite")

Is there any way to debug which specific datanode(s) host/ports are returned to Spark by that namenode?


  • I only needed to set the Spark log level to debug.
