Search code examples
pythonapache-sparkhbasehappybase

How to put values into hbase table through happybase?


My development environment is centos7, hbase 1.2.5, happybase 1.1.0, python 2.7, PyCharm, hadoop 2.7.3, spark 2.1 I am developing a big data software. I need put the values into HBase table. The values are from Spark RDD. The following are the codes:

import happybase
from pyspark import SparkContext, SparkConf

connection = happybase.Connection('localhost') 
table = connection.table('tablename')
conf = SparkConf().setAppName("myFirstSparkApp").setMaster("local")
sc = SparkContext(conf=conf)
distFile = sc.textFile("/inputFilePath/")  
newLines = distFile.filter(lambda x: 'filter":' in x) 
newLines = newLines.map(lambda line:line.split('"'))
# The following line is working. Insert a row into the table.
table.put(b'row-key0', {'billCode:': '222', 'trayCode:': '222', 'pipeline:': '333'})
# But the following line is not working. what is wrong? Why?
newLines.foreach(lambda x: table.put(b'row-key', {'billCode:': x[7], 'trayCode:': x[3], 'pipeline:': x[11]}))

But the last line code is not working. The error messages are:

ImportError: No module named cybin pickle.PicklingError: Could not serialize object: ImportError: No module named cybin

I am a new developer of spark+happybase+python. how to resolve it? Kindly need your help, please. Thank you.


Solution

  • Here's a simple example.

    import happybase
    from pyspark import SparkContext, SparkConf
    conf = SparkConf().setAppName("App").setMaster("local")
    sc = SparkContext(conf=conf)
    rdd = parallelize([("a","1"),("b","2")])
    def func(x):
        conn = happybase.Connection('localhost')
        table = conn.table("table_name")
        table.put(x[0],{"cf:c":x[1]})
        conn.close()
    rdd.foreach(func)
    

    But not perfect,You can refer http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd Good luck.