My current web-app project calls for a little NLP:
... which much of that is a childishly easy task if you’ve got NLTK — which I do, sort of: the app backend is Django on Tornado; you’d think doing these things would be a non-issue.
However, I’ve got to interactively provide the user feedback for which the tokenizers are necessitated, so I need to do tokenize the data clientside.
Right now I actually am using NLTK, via a REST API call to a Tornado process that wraps the NLTK function and little else. At the moment, things like latency and concurrency are obviously suboptimal w/r/t this ad-hoc service, to put it politely. What I should be doing, I think, is getting my hands on Coffee/Java versions of this function if not reimplementing it myself.
And but so then from what I've seen, JavaScript hasn’t been considered cool long enough to have accumulated the not-just-web-specific, general-purpose library schmorgasbörd one can find in C or Python (or even Erlang). NLTK of course is a standout project by anyones’ measure but I only need a few percent of what it is packing.
But so now I am at a crossroads — I have to double down on either:
Or something else entirely. What should I do? Like to start things off. This is my question. I’m open to solutions involving an atypical approach — as long as your recommendation is not distasteful (e.g. “use Silverlight”) and/or a time vortex (e.g. “get a computational linguistics PhD you troglodyte”) I am game. Thank you in advance.
I think that, as you wrote in the comment, the amount of data needed for efficient algorithms to run will eventually prevent you from doing things client-side. Even basic processing require lots of data, for instance bigram/trigram frequencies, etc. On the other hand, symbolic approaches also need significant data (grammar rules, dictionaries, etc.). From my experience, you can't run a good NLP process without at the very least 3MB to 5MB of data, which I think is too big for today's clients.
So I would do things over the wire. For that I would recommend an asynchronous/push approach, maybe use Faye or Socket.io ? I'm sure you can achieve a perfect and fluid UX as long as the user is not stuck while the client is waiting for the server to process the text.