Search code examples
pythonnlpwordnet

Is there an algorithm to calculate a numerical rating of the degree of abstractness of a word in NLP?


Is there an algorithm that can automatically calculate a numerical rating of the degree of abstractness of a word. For example, the algorithm rates purvey as 1, donut as 0, and immodestly as 0.5 ..(these are example values)

Abstract words in the sense words that refer to ideas and concepts that are distant from immediate perception, such as economics, calculating, and disputable. Other side Concrete words refer to things, events, and properties that we can perceive directly with our senses, such as trees, walking, and red.


Solution

  • There's no definition of abstractness that I know of, neither any algorithm to calculate it.

    However, there are several directions I would use as proxies

    1. Frequency - Abstract concepts are likely to be pretty rare in a common speech, so a simple idf should help identify rare words.

    2. Etymology - Common words in English, are usually decedent from Germanic origin, while more technical words are usually borrowed from French / Latin.

    3. Supervised learning - If you have Wikipedia articles you find abstract, then the common phrases or word would probably also describe similar abstract concepts. Training a classifier can be a way to score.

    There's no ground truth as to what is abstract, and what is concrete, especially if you try to quantify it. I suggest aggregating these proxies to a metric you find useful for your needs.