I'm going to design an open-source web service which should collect ("web-scrape") some data from multiple - currently three - web sites.
The web sites do not expose any web service nor any API, they just publish web pages.
Data will be collected 'live' on any client's request from all the web sites in parallel, and will then be parsed to XML to be returned to the client.
The server operating system will be Linux.
The clients will initially be just an Android application of mine.
The concurrent clients will possibly be about 100 or more, if the project will be successful... ;-).
Currently my preferencese go to the adoption of:
Since I suppose the specifications on the project can be of general use and interest, and since I am getting many problems with the combined use of Coro with mod_perl2, I ask:
Are my adoption preferences well matched?
Do you see any incompatibilities or potential problems?
Do you have any suggestion to enhance (in this order):
You probably don't want to develop using mod_perl for any new project anymore. You really want to use something Plack based, or maybe even Plack itself. If you want to use Coro, using a AnyEvent such as Twiggy based backend may make most sense (though you may want to put a reverse proxy in front of it).