I have problem using OrietDB Lucene index. When I query using it, it return an incomplete dataset. Here is the example:
create class Foo extends V
create property Foo.text string
create index Foo.text_spanish on Foo(text) fulltext engine lucene metadata
{ "analyzer": "org.apache.lucene.analysis.es.SpanishAnalyzer",
"index": "org.apache.lucene.analysis.es.SpanishAnalyzer",
"query": "org.apache.lucene.analysis.es.SpanishAnalyzer",
"allowLeadingWildcard": true
}
insert into Foo (text) values ("axxx")
insert into Foo (text) values ("áxxx")
insert into Foo (text) values ("xxxa")
insert into Foo (text) values ("xxxá")
insert into Foo (text) values ("xxaxx")
insert into Foo (text) values ("xxáxx")
now when I run this query:
select from Foo where text lucene "*a*"
I get:
xxáxx
xxaxx
xxxa
axxx
And it missed
áxxx
xxxá
And if I run this:
select from Foo where text lucene "*á*"
I get:
áxxx
xxxá
And miss the rest. Even in this case it should show xxáxx. What am I doing wrong?
By default, OrientDB supports all analyzers listed here, however there are characters that are not considered "Basic Latin" and are available only when creating a custom analyzer with supported filters, such as ASCIIFoldingFilter.
Once you create and compile the class, import its .jar in the OrientDB's lib directory and then create the index with your custom analyzer.
In the meantime a quick solution would be:
SELECT FROM Foo WHERE text LUCENE "*a*" OR text LUCENE "*á*";