Does Stanford Core NLP support Russian sentence and word tokenization?

I could not see any Russian pre-trained tokenizer in Sandford-NLP and stanfordCoreNLP. Are there any models for Russian yet?

Solution

Unfortunately I don't know of any extensions that handle that for Stanford CoreNLP.

You can use Stanza (https://stanfordnlp.github.io/stanza/) which is our Python package to get Russian tokenization and sentence splitting.

You could theoretically tokenize and sentence split with Stanza, and then use the Stanford CoreNLP Server (which you can also use via Stanza) if you had any CoreNLP specific components you wanted to work with.

A group a while back submitted some models for Russian, but I don't see anything for tokenization.

The link to their resources is here: https://stanfordnlp.github.io/CoreNLP/model-zoo.html

What is difference between Core NLP and Stanford NLP?
Google colab Glove_Python pip install not working
Which Stanford NLP package to use for content categorization>
Extracting clause from a Penn Treebank-formatted text
How to use local files in an Azure Function hosted on the Linux Consumption plan?
Java Stanford NLP: Part of Speech labels?
Is there any part of speech tagger and tokenizer of Tamil language?
How to make stanza lemmatizer to return just the lemma instead of a dictionary?
How to get Enhanced++ dependency labels with a java command line in the terminal?
What is Stanford CoreNLP's recipe for tokenization?
Stanford CoreNLP and Emoji?
TypeError: stat: path should be string, bytes, os.PathLike or integer, not _io.TextIOWrapper
Preventing Stanford Core NLP Server from outputting the text it receives
Stanford Parser for Python: Output Format
Extracting the relationship between entities in Stanford CoreNLP
Convert constituent string to Tree object (Stanza)
Calculating similarity score in contexto.me clone
Only Get Tokenized Sentences as Output from Stanford Core NLP
How can I find the cosine similarity between two song lyrics represented as strings?
GloVe Import error - Corpus - Unable to import
How to extract name from string using nltk
How to see if one Nokogiri::XML::Node contains parts of another Nokogiri::XML::Node?
Can someone explain how to create a PTB Dataset And/Or Train my own model using StanfordNLP?
Error while loading vector from Glove in Spacy
How to store Stanza Span in MongoDB collection?
Stanford's Stanza NLP: find all words ids for a given span
NoneType erorr when calling .lower() method on annotated text
Extract Noun Phrases with Stanza and CoreNLPClient
How do I get word indexes for Glove embeddings in pytorch
Converting a dataset to CoNLL format. Label remaining tokens with O