Search code examples
pythonflaskapache-sparkcherrypypyspark

PySpark+Flask+CherryPy - AttributeError: 'module' object has no attribute 'tree'


I'm trying to test how to integrate Flask with Spark model according to this tutorial https://www.codementor.io/spark/tutorial/building-a-web-service-with-apache-spark-flask-example-app-part2#/ . Here CherryPy is used for wsgi. Trouble is that when we are launching app via spark-submit, it shows such stack trace:

Traceback (most recent call last):
  File "/home/roman/dev/python/flask-spark/cherrypy.py", line 43, in <module>
    run_server(app)
  File "/home/roman/dev/python/flask-spark/cherrypy.py", line 21, in run_server
    cherrypy.tree.graft(app_logged, '/')
AttributeError: 'module' object has no attribute 'tree'

I have no idea where the trouble is. I think that it because of new/old version or something like that, but I'm not sure. I have used also python 3 instead of python 2, but it didn't help. Here is wsgi config:

import time, sys, cherrypy, os
from paste.translogger import TransLogger
from webapp import create_app
from pyspark import SparkContext, SparkConf

def init_spark_context():
    # load spark context
    conf = SparkConf().setAppName("movie_recommendation-server")
    # IMPORTANT: pass aditional Python modules to each worker
    sc = SparkContext(conf=conf, pyFiles=['test.py', 'webapp.py'])

    return sc


def run_server(app):

    # Enable WSGI access logging via Paste
    app_logged = TransLogger(app)

    # Mount the WSGI callable object (app) on the root directory
    cherrypy.tree.graft(app_logged, '/')

    # Set the configuration of the web server
    cherrypy.config.update({
        'engine.autoreload.on': True,
        'log.screen': True,
        'server.socket_port': 5432,
        'server.socket_host': '0.0.0.0'
    })

    # Start the CherryPy WSGI web server
    cherrypy.engine.start()
    cherrypy.engine.block()


if __name__ == "__main__":
    # Init spark context and load libraries
    sc = init_spark_context()
    dataset_path = os.path.join('datasets', 'ml-latest-small')
    app = create_app(sc, dataset_path)

    # start web server
    run_server(app)

Solution

  • Traceback you've provided clearly shows that your app is trying to use a local module called cherrypy (/home/roman/dev/python/flask-spark/cherrypy.py) not the actual cherrypy library (which should be something like /path/to/your/python/lib/python-version/siteX.Y/cherrypy).

    To solve this problem you can simply rename the local module to avoid conflicts.