Search code examples
c#lucenelucene.net

Lucene.net 4.8 sort returns doc with NULL fields. Invalidating sort


Using Lucene.Net, the indexer index object with a field in Document specifically used for sorting like this :

new TextField("lastname-sort", person.LastName.RemoveDiacritics(), Field.Store.NO)

However when searching with sort using new SortField("lastname-sort", SortFieldType.STRING, false) will return some scoreDocs (the first ones) with fields to null, which is odd to me.

After researching in the doc (https://lucenenet.apache.org/docs/4.8.0-beta00005/api/Lucene.Net/Lucene.Net.Search.Sort.html?q=sort) - that is not up-to-date, because using new Field is deprecated and using Field.Index.NOT_ANALYZED is also deprecated... and Stackoverflow, I do not find any answer ...

Note that if the fields of the ScoreDocs hits are not null, they are correctly sorted.

EDIT : Here is how the sort is made, roughly. Code is simplified. query is not null, searcher, reader and sort are not null.

Document contains a field for sorting, and multiple other stored fields:

var doc = new Document 
{
  new TextField("lastname-sort", person.LastName.RemoveDiacritics(), Field.Store.NO),
  //other fields
}

sort:

var sort = new Sort(new SortField("lastname-sort", SortFieldType.STRING));

search:

TopDocs hits = _indexSearcher!.Search(query, _nrtIndexReader!.NumDocs, sort)

In the hits results, the top X ScoreDocs (a dozen out of 6000 - always the same) have all their Document's fields to NULL (they are not NULL when indexed).

var indexedPersons = hits.ScoreDocs.Select(hit => _indexSearcher.Doc(hit.Doc)).Select(document =>
            {
                return new IndexedPerson
                {
                    Uid = document.Get("uid"),
                    FirstName = document.Get("firstname"),
                    LastName = document.Get("lastname"),
                    // Some Other Fields
                };
            }).ToList();

Note UID is not NULL, which allow us to retrieve the other values from the DB.


Solution

  • The null fields were returned because of this analyzer used to index and search :

    private readonly StandardAnalyzer _standardAnalyzer = new(AppLuceneVersion);

    Meaning some lastname (Will, To, etc.) were considered stopWords. fields are therefore set to NULL and the sort would sort them like the field was empty (null).

    Modifying to private readonly StandardAnalyzer _standardAnalyzer = new(AppLuceneVersion, CharArraySet.EMPTY_SET); solved the issue.