Search code examples
rdfsemantic-webmultilingualsemantic-analysisopencalais

Multilanguage OpenCalais like system?


I was very impressed with the OpenCalais system. It's (is/has) a web service where you send your text, they analyze it, then you are provided with a series of categorized (RDF enabled) tags that your document belongs to.

But - at the moment - English is the only supported language.

Do you know of similar systems that handle multilanguage documents? (I'm interested n Italian, but multi language is a plus, of course)


Solution

  • Apache Stanbol can analyze texts in many different languages. So far the following languages are supported (precision and recall values may vary according to the language):

    • English,
    • 中文 (Chinese),
    • Español (Spanish),
    • Русский (Russian),
    • Português (Portuguese),
    • Deutsch (German),
    • Italiano (Italian),
    • Nederlands (Dutch),
    • Svenska (Swedish),
    • Dansk (Danish),
    • العربية (Arabic),
    • עברית (Hebrew),
    • 日本語 (Japanese).

    The analysis will return the discovered entities. The analysis output format can be:

    • JSON-LD,
    • RDF/XML,
    • RDF/JSON,
    • Turtles,
    • N-TRIPLES.

    Entities, or tagging, of texts can be further tailored according to the system configuration. Ideally any custom vocabulary can be plugged into the system.

    There are a couple of demo end-points:

    Not sure whether all the above languages are supported in the afore-mentioned end-points.

    RedLink GmbH is going to provide cloud services based on Apache Stanbol and related software.

    The WordLift plugin for WordPress already provides text analysis within WordPress for all the afore-mentioned languages (currently in testing stage). You can try it out installing the plug-in in WordPress and submitting textual contents in the post body.

    You can also subscribe and write to the Apache Stanbol mailing list for specific requests or information.