amazon-web-services emr amazon-emr apache-zeppelin

AWS EMR Zeppelin is missing MYSQL interpreter

I launched a fresh AWS EMR Spark cluster with Zeppelin on AWS to query an MYSQL database. When I tried to add an MYSQL interpreter in Zeppelin the option does not exist. I googled to find a way to get the interpreter to display but I didn't find a solution. How can I get the MYSQL interpreter in Zeppelin so I can query the MYSQL database?

Solution

Spark SQL supports many features of SQL:2003 and SQL:2011 ^{[ 1]}^[2], you may consider doing that that via Spark on Zeppelin by adding dependency.

Get a mysql connector with proper version
Add it as a dependency to the Spark interpreter on Zeppelin. (I put the jar on the master machine)

You should be able to access a MySQL table right now. The following is an example using the API of Scala:

/* Database Configuration*/
val jdbcURL = s"jdbc:mysql://${HOST}/${DATABASE}"
val jdbcUsername = s"${USERNAME}"
val jdbcPassword = s"${PASSWORD}"

import java.util.Properties
val connectionProperties = new Properties()
connectionProperties.put("user", jdbcUsername)
connectionProperties.put("password", jdbcPassword)
connectionProperties.put("driver", "com.mysql.cj.jdbc.Driver")

/* Read Data from MySQL */
val desiredData = spark.read.jdbc(jdbcURL, "${TABLE NAME}", connectionProperties)
desiredData.printSchema

/* Data Manipulation */
desiredData.createOrReplaceTempView("desiredData")
val query = s"""
SELECT COUNT(*) AS `Record Number`
FROM desiredData
"""
spark.sql(query).show

val query2 = s"""
SELECT ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY column1, column2) AS column3
FROM desiredData
"""
spark.sql(query2).show
.
.
.

Testing Notes:

EMR: emr-5.10.0 with Pig 0.17.0, Zeppelin 0.7.3, and ,Spark 2.2.0
MySQL: MariaDB 5.2.10

References

Apache Hive (n.d.). Home. [online] Cwiki.apache.org. Available at: https://cwiki.apache.org/confluence/display/Hive/Home [Accessed 1 Dec. 2017].
Apache Spark (n.d.). Compatibility with Apache Hive. [online] spark.apache.org. Available at: https://spark.apache.org/docs/latest/sql-programming-guide.html#compatibility-with-apache-hive [Accessed 1 Dec. 2017].