Search code examples
lucenesearch-enginelucene.net

Lucene difference between Term and Fields


I've read a lot about Lucene indexing and searching and still can't understand what Term is?What is the difference between term and fields?


Solution

  • A very rough analogy would be that fields are like columns in a database table, and terms are like the contents in each database column.

    More specifically to Lucene:

    Terms

    Terms are indexed tokens. See here:

    Lucene Analyzers are processing pipelines that break up text into indexed tokens, a.k.a. terms

    So, for example, if you have the following sentence in a document...

    "This is a list of terms"
    

    ...and you pass it through a whitespace tokenizer, this will generate the following terms:

    This
    is
    a
    list
    of
    terms
    

    Terms are therefore also what you place into queries, when performing searches. See here for a definition of how they are used in the classic query parser.

    Fields

    A field is a section of a document.

    A simple example is the title of a document versus the body (the remaining text/content) of the document. These can be defined as two separate Lucene fields within a Lucene index.

    (You obviously need to be able to parse the source document so that you can separate the title from the body - otherwise you cannot populate each separate field correctly, while building your Lucene index.)

    You can then place all of the title's terms into the title field; and the body's terms into the body field.

    Now you can search title data separately from body data.

    You can read about fields here and here. There are various different types of fields, specific to the type of data (terms) they will be holding.