Search code examples
orientdborientdb2.2

OrientDB incorrect query result against lucene search


I have problem using OrietDB Lucene index. When I query using it, it return an incomplete dataset. Here is the example:

create class Foo extends V
create property Foo.text string
create index Foo.text_spanish on Foo(text) fulltext engine lucene metadata 
        { "analyzer": "org.apache.lucene.analysis.es.SpanishAnalyzer", 
          "index": "org.apache.lucene.analysis.es.SpanishAnalyzer", 
          "query": "org.apache.lucene.analysis.es.SpanishAnalyzer", 
          "allowLeadingWildcard": true             
}

insert into Foo (text) values ("axxx")
insert into Foo (text) values ("áxxx")
insert into Foo (text) values ("xxxa")
insert into Foo (text) values ("xxxá")
insert into Foo (text) values ("xxaxx")
insert into Foo (text) values ("xxáxx")

now when I run this query:

select from Foo where text lucene "*a*"

I get:

xxáxx
xxaxx
xxxa
axxx

And it missed

áxxx
xxxá

And if I run this:

select from Foo where text lucene "*á*"

I get:

áxxx
xxxá

And miss the rest. Even in this case it should show xxáxx. What am I doing wrong?


Solution

  • By default, OrientDB supports all analyzers listed here, however there are characters that are not considered "Basic Latin" and are available only when creating a custom analyzer with supported filters, such as ASCIIFoldingFilter.

    Once you create and compile the class, import its .jar in the OrientDB's lib directory and then create the index with your custom analyzer.

    In the meantime a quick solution would be:

    SELECT FROM Foo WHERE text LUCENE "*a*" OR text LUCENE "*á*";