Search code examples
luceneapache-tika

Apache Tika vs. Apache Lucene


I would have a question concerning analyzing documents. With Apache Tika, it is possible to get content and metadata of different files with different types.

Is it also possible to get keywords of files (i.e. stemming) with Tika or do I still need Lucene for that?


Solution

  • I don't know if it's possible but i would recommend doing all the keyword analysis in lucene. My personal reasons:

    • Tika's main goal is to extract informations out of files
    • Lucenes defines how data are going to be analyzed and indexed. How data will be analyzed has big impact on how your lucene index performes in searches (finding stuff you expect to find)
    • it's kind of separation of concerns that Tika is only extracting and Lucene cares about the search relevant things