I was very impressed with the OpenCalais system. It's (is/has) a web service where you send your text, they analyze it, then you are provided with a series of categorized (RDF enabled) tags that your document belongs to.
But - at the moment - English is the only supported language.
Do you know of similar systems that handle multilanguage documents? (I'm interested n Italian, but multi language is a plus, of course)
Apache Stanbol can analyze texts in many different languages. So far the following languages are supported (precision and recall values may vary according to the language):
The analysis will return the discovered entities. The analysis output format can be:
Entities, or tagging, of texts can be further tailored according to the system configuration. Ideally any custom vocabulary can be plugged into the system.
There are a couple of demo end-points:
Not sure whether all the above languages are supported in the afore-mentioned end-points.
RedLink GmbH is going to provide cloud services based on Apache Stanbol and related software.
The WordLift plugin for WordPress already provides text analysis within WordPress for all the afore-mentioned languages (currently in testing stage). You can try it out installing the plug-in in WordPress and submitting textual contents in the post body.
You can also subscribe and write to the Apache Stanbol mailing list for specific requests or information.