Search code examples
pythonsqliteapache-sparkibm-cloudobject-storage

How do I read a SQLite file stored in Object Storage from a Python notebook in Spark as a Service all on Bluemix?


I have a SQLite file in an Object Storage instance on IBM's Bluemix. I want to access that file from a Python notebook in Spark as a Service. Right now, the following code does not work:

df = sqlContext.read.format('jdbc').\
     options(url='jdbc:sqlite:swift://container.spark/db.sqlite',\
     dbtable='schema.tablename').load()
print df.take(10)

The error I get is: No suitable driver found for jdbc:sqlite://container.spark/db.sqlite

I've been looking for documentation for this, but I haven't been able to find it. Any help would be much appreciated. How do I access a SQLite file on Object Storage from Spark as a Service on Bluemix?


Solution

  • You are missing one parameter to specify in option that is which driver to look for.

    df = sqlContext.read.format('jdbc').\
         options(url='jdbc:sqlite:Chinook_Sqlite.sqlite',\
         dbtable='employee',driver='org.sqlite.JDBC').load()
    

    Add driver='org.sqlite.JDBC' and you would note get the driver not found.

    But for reading sqlite from Object storage, i don't think that is supported by org.sqlite.JDBC. The only three methods allowed are absolute path, relative path from your file system and memory. https://bitbucket.org/xerial/sqlite-jdbc

    In below example, i have downloaded sample sqlite database in Bluemix spark service's local disk and then used it.

    https://cdsx.ng.bluemix.net/data/notebooks/1fa97ca4-721a-41b2-a958-ee58bc47577b/view?access_token=5f4e48fd292c59949ffc969b048ab849861b0a95ed708d0d679bf65ce2bf1588

    Github importable notebook:- https://github.com/charles2588/bluemixsparknotebooks/blob/master/sqllite_jdbc.ipynb

    Thanks, Charles.