I use Solr to index my blog, and an article may have tens of thousands of words. so when I do a query, the response may like this
"response": {
"numFound": 4,
"start": 0,
"docs": [
{
"content": ["abc........"], //the whole article may have 10000 words
"_id": "5d48f6d598b89e22d07629a0",
"_version_": 1642371362640101376
},
....
]
}
there's no need to return the whole article in the search page. so it is possible to return the first 100 words of the article?
Sounds like you are currently storing the whole article, but want to just return a first para or so.
The easiest way to do so is to mark your content field as index only (stored=false) and clone it to a separate (indexed=false) field that will return what you want.
You cannot do so during indexing, but you can do it with UpdateRequestProcessors pipeline.
So you would use
The challenge is actually to truncate to 100 words as that is surprisingly hard to define in language-neutral fashion (and what about punctuation?). If you are happy to truncate by characters, then you can do that with TruncateFieldUpdateProcessorFactory. But if you insist, it has to be words, you could look into RegexReplaceProcessorFactory and define regular expression that matches that.