I have a word in Russian (which is actually where the difficulty arises). It's an adjective. And I need to make it in a noun form.
I found an interesting library that can parse words, inflect and normalize them. Thiw is library: pymorphy2
, but no matter what I do, I can't get the expected result. I want to receive:
These are the names of cities. On the left is the form of the word that I get from the data, on the right, the form that I need.
киселевский ---> киселевск
юргинский ---> юрга
So far, the words on the left are defined only as different forms of the adjective, if you look at their analysis. Is there any way to convert them to nouns?
Small word parsing code:
import pymorphy2
morph = pymorphy2.MorphAnalyzer()
word = 'Киселевский'
test = morph.parse(word)[0]
test.tag.POS
>>'ADJF'
test_2 = morph.parse(word)[0].normal_form
test_2
>>'киселевский'
test.lexeme
>> [Parse(word='киселевский', tag=OpencorporaTag('ADJF masc,sing,nomn'), normal_form='киселевский', score=1.0, methods_stack=((FakeDictionary(), 'киселевский', 16, 0), (KnownSuffixAnalyzer(min_word_length=4, score_multiplier=0.5), 'вский'))),
Parse(word='киселевского', tag=OpencorporaTag('ADJF masc,sing,gent'), normal_form='киселевский', score=1.0, methods_stack=((FakeDictionary(), 'киселевского', 16, 1), (KnownSuffixAnalyzer(min_word_length=4, score_multiplier=0.5), 'вский'))),
Parse(word='киселевскому', tag=OpencorporaTag('ADJF masc,sing,datv'), normal_form='киселевский', score=1.0, methods_stack=((FakeDictionary(), 'киселевскому', 16, 2), (KnownSuffixAnalyzer(min_word_length=4, score_multiplier=0.5), 'вский'))),
Parse(word='киселевского', tag=OpencorporaTag('ADJF anim,masc,sing,accs'), normal_form='киселевский', score=1.0, methods_stack=((FakeDictionary(), 'киселевского', 16, 3), (KnownSuffixAnalyzer(min_word_length=4, score_multiplier=0.5), 'вский'))),
Parse(word='киселевский', tag=OpencorporaTag('ADJF inan,masc,sing,accs'), normal_form='киселевский', score=1.0, methods_stack=((FakeDictionary(), 'киселевский', 16, 4), (KnownSuffixAnalyzer(min_word_length=4, score_multiplier=0.5), 'вский'))),
Parse(word='киселевским', tag=OpencorporaTag('ADJF masc,sing,ablt'), normal_form='киселевский', score=1.0, methods_stack=((FakeDictionary(), 'киселевским', 16, 5), (KnownSuffixAnalyzer(min_word_length=4, score_multiplier=0.5), 'вский')))...]
You'll need a database (for example, by scraping Wiktionary) to do this, since relational adjectives are not considered to be a form of the noun from which they are derived; they are considered to be a different word (lexeme), as indicated by the pymorphy2
ID (attribute methods_stack[0][2]
) of the parse object.
With names of cities I assume this would be even more difficult, since the rules for converting between a source toponym and its relational adjective are not regular (just as in English), and you may only be able to find proper data for cities above a certain scale.