I have this Solr index that contains a large numer of quite long text files, indexed with the text_sv
schema. I want to print out every single snippet for each indexed document. However, I only retrieve a few ones, even though I have tried to smanipulate the various settings as specified in the documentation.
Here is the code section:
results = solr.search(search_string, rows = result_limit, sort = order,
**{
'hl':'true',
'hl.fragsize': 100,
'hl.fl': 'fulltext',
'hl.maxAnalyzedChars': -1,
'hl.snippets': 100,
})
resultcounter = 0
for result in results:
resultcounter += 1
fulltexturl = '<a href="http://localhost/source/\
' + result['filename'] + '">' + result['filename'][:-4] + '</a>'
year = str(result['year'])
number = str(result['number'])
highlights = results.highlighting
print("Saw {0} result(s).".format(len(results)))
print('<p>' + str(resultcounter) + '. <b>År:</b> ' + year + ', <b>Nummer\
: </b>' + number +' ,<b>Fulltext:</b> ' + fulltexturl + '. <b>\
</b> träffar.<br></p>')
inSOUresults = 1
for idnumber, h in highlights.items():
for key, value in h.items():
for v in value:
print('<p>' + str(inSOUresults) + ". " + v + "</p>")
inSOUresults += 1
What am I doing wrong?
You probably want a very large (or 0) value for the hl.fragments
parameter (from the Highlighting wiki page):
With the original Highlighter, if you have a use case where you need to highlight the complete text of a field and need to highlight every instance of the search term(s) you can set hl.fragsize to a very high value (whatever it takes to include all the text for the largest value for that field), for example &hl.fragsize=50000.
However, if you want to change fragsize to a value greater than 51200 to return long document texts with highlighting, you will need to pass the same value to hl.maxAnalyzedChars parameter too. These two parameters go hand in hand and changing just the hl.fragsize would not be sufficient for highlighting in very large fields.