Search code examples
c#lucenelucene.net

Lucene.NET TextField not being indexed


Using .NET 6.0 and Lucene.NET-4.8.0-beta00016 from NuGet

I am having an issue implementing the quickstart example from the website. When using TextField in a document, the field is not indexed. The search later in the BuildIndex method retrieves no results. If TextField is changed to StringField, the example works and the search returns a valid result.

Why does StringField work and TextField doesn't? I read that StringField is not analyzed but TextField is, so perhaps it's something to do with the StandardAnalyzer?

public class LuceneFullTextSearchService {

private readonly IndexWriter _writer;
private readonly Analyzer _standardAnalyzer;

public LuceneFullTextSearchService(string indexName)
{
    // Compatibility version
    const LuceneVersion luceneVersion = LuceneVersion.LUCENE_48;
    string indexPath = Path.Combine(Environment.CurrentDirectory, indexName);
    Directory indexDir = FSDirectory.Open(indexPath);

    // Create an analyzer to process the text 
    _standardAnalyzer = new StandardAnalyzer(luceneVersion);

    // Create an index writer
    IndexWriterConfig indexConfig = new IndexWriterConfig(luceneVersion, _standardAnalyzer)
    {
        OpenMode = OpenMode.CREATE_OR_APPEND,
    };
    _writer = new IndexWriter(indexDir, indexConfig);
}

public void BuildIndex(string searchPath)
{
    Document doc = new Document();
    
    TextField docText = new TextField("title", "Apache", Field.Store.YES); 
    doc.Add(docText);
    
    _writer.AddDocument(doc);

    //Flush and commit the index data to the directory
    _writer.Commit();
    
    // Parse the user's query text
    Query query = new TermQuery(new Term("title", "Apache"));
    
    // Search
    using DirectoryReader reader = _writer.GetReader(applyAllDeletes: true);
    IndexSearcher searcher = new IndexSearcher(reader);
    TopDocs topDocs = searcher.Search(query, n: 2);

    // Show results
    Document resultDoc = searcher.Doc(topDocs.ScoreDocs[0].Doc);
    string title = resultDoc.Get("title");
}
}

Solution

  • StandardAnalyzer includes a LowerCaseFilter, so your text is stored in the index as lower-case.

    However, when you build your query, the text you use is "Apache" rather than "apache", so it doesn't produce any hits.

    // Parse the user's query text
    Query query = new TermQuery(new Term("title", "Apache"));
    

    Option 1

    Lowercase your search term.

    // Parse the user's query text
    Query query = new TermQuery(new Term("title", "Apache".ToLowerInvariant()));
    

    Option 2

    Use a QueryParser with the same analyzer you use to build the index.

    QueryParser parser = new QueryParser(luceneVersion, "title", _standardAnalyzer);
    Query query = parser.Parse("Apache");
    

    The Lucene.Net.QueryParser package contains several implementations (the above example uses the Lucene.Net.QueryParsers.Classic.QueryParser).