Search code examples
javacontent-management-systemscreen-scraping

Java Framework - Using screen scraping to mesh heterogenous server environments


OK. So I have a CMS written in Java that satisfies the needs of several hundred clients. But periodically, a client will need a specialized application: for example, a class registration database application.

So let's say that I don't feel like writing it or I'm too busy. So I outsource it to someone else but I don't want his/her code on my server because he/she codes in a language not supported by my server environment. So I have the developer host it on a cheap server somewhere else. But I still want the application to appear on my client's main website (hosted by my CMS) within the CMS's template on the client's primary domain.

How can I achieve this? Can I use some type of screen-scraper/proxy that intercepts client requests on my site, passes them to the external server that renders HTML and I then integrate his/her HTML back into my template? How would I deal with subsequent requests back and forth in a truly interactive application?

So that's what I want to do (I think). Does anyone have any insight or experience with this type of thing? What are the pitfalls? Are there products that do this easily?


Solution

  • Write a Filter which you'd configure in your web.xml which intercepts requests matching the particular type you'd like to outsource. This filter could then use Commons HttpClient to make the actual request to the external systems. You'd then just pipe the response back directly to the user. Basically you're building a custom HTTP proxy. You could even add stuff like decoration of content (maybe common header, check out SiteMesh), security, URL rewriting, etc. You might want to support caching to offset the performance penalty of proxying requests.

    If you need to support sessions it gets tricker, but you could do it by passing the JSESSIONID value along to your partners + adding some session replication mechanism. For example, you could provide a web service which takes a sessionId and returns the serialized session object for use by your partners.