Search code examples
postgresqljdbcpysparkhawq

PostgreSQL error when extracting table data via JDBC from Spark


I had the Spark to HAWQ JDBC connection working, but now two days later there's an issue with extracting data from a table. Nothing's changed on the Spark configuration...

Simple step #1 - printing the schema from a simple table in HAWQ I can create a SQLContext DataFrame and connect to the HAWQ db:

df = sqlContext.read.format('jdbc').options(url=db_url, dbtable=db_table).load()
df.printSchema()

Which prints:

root
 |-- product_no: integer (nullable = true)
 |-- name: string (nullable = true)
 |-- price: decimal (nullable = true)

But when actually trying to extract data:

df.select("product_no").show()

These errors pop up...

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): 
org.postgresql.util.PSQLException: ERROR: could not write 3124 bytes to temporary file: No space left on device (buffile.c:408)  (seg33 adnpivhdwapda04.gphd.local:40003 pid=544124) (cdbdisp.c:1571)
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2182)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1911)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:173)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:615)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:465)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:350)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:372)
    at org.apache.spark.sql.jdbc.JDBCRDD.compute(JDBCRDD.scala:350)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
    at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)

Things I've tried (but willing to try again if there's more precise steps):

  • Tried a 'df -i' on the HAWQ master node and there's only 1% utilization
  • Tried a dbvacuum on the HAWQ database (VACUUM ALL isn't recommended on HAWQ)
  • Tried creating this tiny new db (with the single table, 3 columns), no luck

This can't be an actual memory deficiency so where and what is tripping this up??


Solution

  • could not write 3124 bytes to temporary file: No space left on device

    The volume used for tempfiles is filling up. Then the temp file will be deleted on error, so you don't actually see the full volume in df.

    This could be a tempfs, like /tmp, on most Linux systems. If so, it's backed by virtual memory. To confirm, check mount and check the setting of PostgreSQL's temp_tablespaces (SHOW temp_tablespaces). If it's blank PostgreSQL will use the default tablespace, which is unlikely to be a tempfs, but if it's set check where that tablespace is. If it's a on tempfs you may need to move it.

    It could also be somehow filling the main tablespace, but that's exceedingly unlikely if it's only at 1% utilization at the moment. Perhaps a massively runaway recursive CTE could do it, but it's unlikely.

    Quota management is also a possibility. Perhaps a filesystem quota is configured?