Search code examples
pysparkhbaseapache-phoenix

Spark 2.2.0 unable to connect to Phoenix 4.11.0 version in loading the table to DF


I'm using the below techstack and trying to connect Phoenix tables using PySpark code. I have downloaded the following jars from the url and tried executing the below code. In logs the connection to hbase is established but the console is stuck with out doing nothing. Please let me know if anybody encountered and fixed similar issue.

https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-spark/4.11.0-HBase-1.2

jars: phoenix-spark-4.11.0-HBase-1.2.jar phoenix-client.jar

Tech Stack all running in same host:

Apache Spark 2.2.0 Version

Hbase 1.2 Version

Phoenix 4.11.0 Version

Copied the hbase-site.xml in the folder path /spark/conf/hbase-site.xml.

Command executed ->

usr/local/spark> spark-submit phoenix.py --jars /usr/local/spark/jars/phoenix-spark-4.11.0-HBase-1.2.jar --jars /usr/local/spark/jars/phoenix-client.jar

Phoenix.py:

from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext

conf = SparkConf().setAppName("pysparkPhoenixLoad").setMaster("local")
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)

df = sqlContext.read.format("org.apache.phoenix.spark").option("table", 
"schema.table1").option("zkUrl", "localhost:2181").load()
df.show()

Error log: Hbase Connection is established, however in the console it is stuck and timing out error is thrown

18/07/30 12:28:15 WARN HBaseConfiguration: Config option "hbase.regionserver.lease.period" is deprecated. Instead, use "hbase.client.scanner.timeout.period"

18/07/30 12:28:54 INFO RpcRetryingCaller: Call exception, tries=10, retries=35, started=38367 ms ago, cancelled=false, msg=row 'SYSTEM:CATALOG,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=master01,16020,1532591192223, seqNum=0


Solution

  • Take a look at these answers :

    Both of the issues happened in Java (with JDBC), but it looks like it's a similar issue here.

    Try to add ZooKeeper hostname (master01, as I see in the error message) to your /etc/hosts :

    127.0.0.1    master01
    

    if you are running all your stack locally.