Search code examples
pythonnlpnodebox-linguistics

How can I select and group comparative and superlative words from a text file?


I am trying to filter the words in a text file. If there are any 'comparative' and 'superlative' words in the file, I want to convert them to 'positive'.

e.g. - 'greatest' -> 'great' and so on.

I am using 'pattern' module for this. In example it says,

from pattern.en import comparative, superlative
print comparative('bad')

gives -> worse works fine. but, If I do:

from pattern.en import comparative, superlative, positive
print positive('worse')

It gives, 'False'

Am I doing it wrong ? Is there any way to find out 'comparative' and 'superlative' words and print the positive word of them ?


Solution

  • This is a misunderstanding: the positive() function doesn't do what you think.

    As far as I can see, the pattern.en module only provides functions for generating comparatives and superlatives from the positive form of an adjective, but not for the inverse (analysing the forms as comparative/superlative of a positive form). There is a lemma() function, which you could expect to do this, but unfortunately it only works for verbs.

    The positive() function you found belongs to sentiment detection; it tries to tell if a given sentence has a positive polarity.

    So, what do you do now? I see two possibilities: You either switch to a different library which supports lemmatisation of adjectives (eg. SpaCy), or you try to build a simple adjective lemmatiser based on the code from the pattern.en module.

    If you go for the second option, have a look at the last 80 lines of code in the inflect module. I suggest you first try to catch the irregular cases (using an inversion of the table given there), then you strip off the -er/-est suffix. There's probably a number of special cases (like iy in heavierheavy).
    Try something yourself, and if you run into problems come back here with a new question!