Search code examples
solrlucenenlpstemming

Are there any Lucene stemmers that handle Shakespearean English?


I'm trying to index some old documents for searching -- 16th, 17th, 18th century.

Modern stemmers don't seem to handle the antiquated word endings: worketh, liveth, walketh.

Are there stemmers that specialize in the English from the time of Shakespeare and the King James Bible? I'm currently using solr.PorterStemFilterFactory.


Solution

  • It looks like the rule changes are minimal for that.

    So, it might be possible to copy/modify the PorterStemmer class and related Factories/Filters.

    Or it might be possible to add those specific rules as Regular expression filter before Porter.