Search code examples
pythonamazon-web-servicespysparkjupyter-notebookamazon-emr

EMR pyspark notebook Spark progress widget gone


Previously when i was running my EMR notebooks - with pyspark - i had these little widgets showing the progress.

i'm talking about these widgets: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-spark-monitor.html

Yesterday i had a lot of issues with the clusters not connecting to the notebooks properly, but today again "everything" is fine - no change we are aware of on our side.

I'm cloning previously used EMR clusters and loading previously used notebooks.

But i do not get the little widgets anymore, otherwise the cluster computes and works as before.

Any ideas? What do i need to check?

Thanks!

I have a bootstrap action that copies a mysql jdbc to /users/hadoop/jars - but i had this before also.

Tried:

  • Created cluster from 0

  • Created notebook from 0

  • Set up web connection to cluster - at least i can see progress here

  • Created various cluster configs

EMR config:

[{
    "classification": "emrfs-site",
    "properties": {
      "fs.s3.enableServerSideEncryption": "true",
      "fs.s3.maxConnections": "2000"
    }
  },
  {
    "classification": "spark",
    "properties": {
      "maximizeResourceAllocation": "true"
    }
  },
  {
    "classification": "livy-conf",
    "properties": {
      "livy.server.session.timeout": "16h"
    }
  },
  {
    "configurations": [
      {
        "classification": "export",
        "properties": {
          "PYSPARK_PYTHON": "/usr/bin/python3"
        }
      }
    ],
    "classification": "spark-env",
    "properties": {}
  }]

I get no error messages nor any such.


Solution

  • This issue has been fixed in the latest EMR notebooks update. You will be able to see the spark monitoring widget which will provide you detailed spark job information. In addition, you can also see progress bar which denotes the overall progress of the cell execution.