I have been trying but failing to write/read tables from MySQL Server 8.0.19 on localhost on Windows 10 with pyspark from Google colab. There's also a lot of similar questions and with some suggested answers but none of the solutions seem to work here. Here is my code:
<...installations ...>
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.appName("Word Count")\
.config("spark.driver.extraClassPath", "/content/spark-2.4.5-bin-hadoop2.7/jars/mysql-connector-java-8.0.19.jar")\
.getOrCreate()
An here is the connection string:
MyjdbcDF = spark.read.format("jdbc")\
.option("url", "jdbc:mysql://127.0.0.1:3306/mydb?user=testuser&password=pwtest")\
.option("dbtable", "collisions")\
.option("driver","com.mysql.cj.jdbc.Driver")\
.load()
I have as well used the .option("driver","com.mysql.jdbc.Driver")
but still keep getting this error:
Py4JJavaError: An error occurred while calling o154.load.
com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
...
...
...
Caused by: java.net.ConnectException: Connection refused (Connection refused)
From this, I guess that MySQL Sever is not reachable.
I have Telnetted to port 3306 & it confirmed that MySQL Server is accepting connections from client machine. I have read that running: netsh advfirewall firewall add rule name="MySQL Server" action=allow protocol=TCP dir=in localport=3306
will permitting firewall rule for MySQL Server incase it was being blocked, yet no change.
Can somebody help outpy?
After about several days of trials I discovered a solution, that is why I am going to answer my own question. I was able to connect using a WAMP server (thanks to @Shubham Jain for suggesting) and as well as without a WAMP server. This answer is without WAMP server.
Downloaded ngrok
from https://ngrok.com/,
Unzipped it,
Saved it on my local windows,
Authinticate with:
./ngrok authtoken xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
(pretty straight forward instructions available on the website)
Still on my local Windows, I copied and ran ngrok tcp 3306
on the command line
C:\Users\userMe> ngrok tcp 3306
and it gave someting like:
ngrok by @inconshreveable
Session Status online
Account userMe (Plan: Free)
Version 2.3.35
Region United States (us)
Web Interface http://localhost:4041
Forwarding tcp://0.tcp.ngrok.io:17992 -> localhost:3306
Connections ttl opn rt1 rt5 p50 p90
0 0 0.00 0.00 0.00 0.00
Where 0.tcp.ngrok.io:17992
is the only thing I am interested in and where 3306
is MySQL's and only port I am interested at exposing to the internet to link with my Google Colab.
So, at the end of the day, my PySpark READ connection will look like:
jdbcDF = spark.read.format("jdbc")\
.option("url", "jdbc:mysql://0.tcp.ngrok.io:17992/mydb?user=testUser&password=pestpw")\
.option("dbtable", "pipeLineTable")\
.option("driver","com.mysql.cj.jdbc.Driver")\
.load();
The WRITE connection will be:
jdbcDF.write.mode("overwrite")\
.format("jdbc")\
.option("url", f"jdbc:mysql://0.tcp.ngrok.io:17992/mydb")\
.option("dbtable", "fromGcTable")\
.option("user", "testUser")\
.option("password", "testpw")\
.option("driver","com.mysql.cj.jdbc.Driver")\
.save()
In both connection strings, note the 0.tcp.ngrok.io:17992
that replaces localhost:3306