I realise that 3.0.2 is an old version of Lucene but if I have Java code as follows:
int nGramLength = 3;
Set<String> stopWords = new Set<String>();
stopwords.add("the");
stopwords.add("and");
...
SnowballAnalyzer snowballAnalyzer = new SnowballAnalyzer(Version.LUCENE_30, "English", stopWords);
ShingleAnalyzerWrapper shingleAnalyzer = new ShingleAnalyzerWrapper(snowballAnalyzer, nGramLength);
Which will generate the frequency of ngrams from a particular a string of text without stop words, how can I disable the LowerCaseFilter which forms part of the SnowBallAnalyzer? I want to preserve the case of the ngrams generated so that I can perform various counts according to the presence / absence of upper case characters in the ngrams.
I am something of a Lucene newbie. And I should add that upgrading the version of Lucene is not an option here.
The Snowball analyzer is a convenience class for using SnowballFilter
. LowerCaseFilter
is baked into the code.
Just copy the SnowballAnalyzer
source and remove line 103 streams.result = new LowerCaseFilter(streams.result);