Search code examples
amazon-web-servicesemramazon-emrapache-zeppelin

AWS EMR Zeppelin is missing MYSQL interpreter


I launched a fresh AWS EMR Spark cluster with Zeppelin on AWS to query an MYSQL database. When I tried to add an MYSQL interpreter in Zeppelin the option does not exist. I googled to find a way to get the interpreter to display but I didn't find a solution. How can I get the MYSQL interpreter in Zeppelin so I can query the MYSQL database?

enter image description here


Solution

  • Spark SQL supports many features of SQL:2003 and SQL:2011 [ 1][2], you may consider doing that that via Spark on Zeppelin by adding dependency.

    1. Get a mysql connector with proper version
    2. Add it as a dependency to the Spark interpreter on Zeppelin. (I put the jar on the master machine) enter image description here
    3. You should be able to access a MySQL table right now. The following is an example using the API of Scala:

      /* Database Configuration*/
      val jdbcURL = s"jdbc:mysql://${HOST}/${DATABASE}"
      val jdbcUsername = s"${USERNAME}"
      val jdbcPassword = s"${PASSWORD}"
      
      import java.util.Properties
      val connectionProperties = new Properties()
      connectionProperties.put("user", jdbcUsername)
      connectionProperties.put("password", jdbcPassword)
      connectionProperties.put("driver", "com.mysql.cj.jdbc.Driver")
      
      /* Read Data from MySQL */
      val desiredData = spark.read.jdbc(jdbcURL, "${TABLE NAME}", connectionProperties)
      desiredData.printSchema
      
      /* Data Manipulation */
      desiredData.createOrReplaceTempView("desiredData")
      val query = s"""
      SELECT COUNT(*) AS `Record Number`
      FROM desiredData
      """
      spark.sql(query).show
      
      val query2 = s"""
      SELECT ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY column1, column2) AS column3
      FROM desiredData
      """
      spark.sql(query2).show
      .
      .
      .
      

    Testing Notes:

    1. EMR: emr-5.10.0 with Pig 0.17.0, Zeppelin 0.7.3, and ,Spark 2.2.0
    2. MySQL: MariaDB 5.2.10

    References

    1. Apache Hive (n.d.). Home. [online] Cwiki.apache.org. Available at: https://cwiki.apache.org/confluence/display/Hive/Home [Accessed 1 Dec. 2017].
    2. Apache Spark (n.d.). Compatibility with Apache Hive. [online] spark.apache.org. Available at: ​https://spark.apache.org/docs/latest/sql-programming-guide.html#compatibility-with-apache-hive [Accessed 1 Dec. 2017].