Search code examples
pythonmultithreadingtwistedwebsocketscrapy

Websocket Server with twisted and Python doing complex jobs in the background


I want to code a Server which handles Websocket Clients while doing mysql selects via sqlalchemy and scraping several Websites on the same time (scrapy). The received data has to be calculated, saved to the db and then send to the websocket Clients.

My question ist how can this be done in Python from the logical point of view. How do I need to set up the code structure and what modules are the best solution for this job? At the moment I'm convinced of using twisted with threads in which the scrape and select stuff is running. But can this be done an easier way? I only find simple twisted examples but obviously this seems to be a more complex job. Are there similar examples? How do I start?


Solution

  • Cyclone, a Twisted-based 'network toolkit', based on/similar to facebook/friendfeed's Tornado server, contains support for WebSockets: https://github.com/fiorix/cyclone/blob/master/cyclone/web.py#L908

    Here's example code:

    Here's an example of using txwebsocket:

    You may have a problem using SQLAlchemy with Twisted; from what I have read, they do not work well together (source). Are you married to SQLA, or would another, more compatible OR/M suffice?

    Some twisted-friendly OR/Ms include Storm (a fork) and Twistar, and you can always fall back on Twisted's core db abstraction library twisted.enterprise.adbapi. There are also async-friendly db libraries for other products, such as txMySQL, txMongo, and txRedis, and paisley (couchdb).

    You could conceivably use both Cyclone (or txwebsockets) and Scrapy as child services of the same MultiService, running on different ports, but packaged within the same Application instance. The services may communicate, either through the parent service or some RPC mechanism (like JSONRPC, Perspective Broker, AMP, XML-RPC (2) etc), or you can just write to the db from the scrapy service and read from it using websockets. Redis would be great for this IMO.