Search code examples
nlppart-of-speech

Gender Detection for Nouns in Spanish


I am implementing a search engine in Spanish. In order to ensure gender neutrality, I need to get the gender of nouns in Spanish - e.g. "pintora" (painter, female) and "pintor" (painter, male). I am currently using FAIR library - that it is really great for NER in Spanish. However, I cannot find any good implementation/library for gender detection in Spanish nouns. Could you help me?

Thank you in advance for your help


Solution

  • After using multiple search engines, including academic ones to perhaps try and find research papers covering topics pertaining to Spanish word gender detection and other related terms, there seems to be no one that has tackled the problem and implemented a solution in a modern library.

    Regardless, you can still tackle the problem by running a Spanish Part of Speech (PoS) tagger (for example, RuPERTa-base (Spanish RoBERTa) + POS) to detect nouns/pronouns, combine those labels with your NER output where required, and then write your own rules for determining the gender of particular nouns/pronouns based on Spanish grammar rules (such as those detailed in A New Reference Grammar of Modern Spanish, specifically Chapter 1 Gender of nouns).

    Hopefully that helps give you some direction if you don't end up finding a ready-made implementation.