Search code examples
solrsunspotautosuggestsunspot-railssunspot-solr

Solr suggester dictionnary not building. Java heap space error?


I'm using solr 5.3.1 and sunspot 2.2.7 on a Rails API with PostgreSQL database.

I've been trying to configure an autosuggest/autocomplete feature for days but struggle to make it work. I want that looking for "foob" return the suggestion "foobar company".

My schema.xml contains this :

<copyField source="*_text"  dest="textSpell" />
<copyField source="*_text"  dest="autocomplete" />
<copyField source="*_s"  dest="textSpell" />

this allow me to copy for the spellcheck (which works fine) and for the autocomplete from the dynamic solr field created by sunspot :

    <dynamicField name="*_text" stored="false" type="text" multiValued="true" indexed="true"/>

This dynamicField contains the value I want to work with : title_text.

My fields for spellchecking and autocompletion looks like this :

<field name="textSpell" stored="false" type="textSpell" multiValued="true" indexed="true"/>
<field name="autocomplete" stored="true" type="autocomplete" multiValued="true" indexed="true"/>

My fieldType for autocomplete looks like this :

<fieldType name="autocomplete" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

Then on solrconfig.xml i have my suggester components :

<searchComponent name="suggest" class="solr.SuggestComponent">
  <lst name="suggester">
    <str name="name">suggest</str>
    <str name="lookupImpl">FuzzyLookupFactory</str>
    <str name="storeDir">suggester_fuzzy_dir</str>
    <str name="dictionaryImpl">DocumentDictionaryFactory</str>
    <str name="field">autocomplete</str>
    <str name="suggestAnalyzerFieldType">autocomplete</str>
    <str name="buildOnOptimize">true</str>
    <str name="buildOnStartup">true</str>
    <str name="buildOnCommit">false</str>
  </lst>
</searchComponent>

<requestHandler name="/suggesthandler" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
    <str name="suggest">true</str>
    <str name="suggest.dictionary">suggest</str>
    <str name="suggest.count">10</str>
  </lst>
  <arr name="components">
    <str>suggest</str>
  </arr>
</requestHandler>

I have 10M+ entries in my base. My goal is autosuggestion on the title attribute.

This setup should index twice my title. Indeed, my index size doubled when I reindexed with those settings.

I have indeed a folder suggester_fuzzy_dir who was created in my core data folder. However when I startup solr, or launch the request /suggesthandler?suggest.build=true, this suggester_fuzzy_dir folder doesn't grow in size, it always contains 1 byte. However the leftover SSD storage space on my disk is reducing, I wasn't able to see from where.

After 45 minutes I usually get a java heap space out of memory error. My disk size returns to normal.

I tried launching solr with option -memory=4096m to allocate more (My computer have 8go RAM). This still doesn't work although it should be enough ? Which makes me think the problem is somewhere else.

Edit : The error returned by solr in the console is as follow :

{
  "error": {
    "msg": "java.lang.OutOfMemoryError: Java heap space",
    "trace": "java.lang.RuntimeException: java.lang.OutOfMemoryError: 
Java heap space\n\tat org.apache.solr.servlet.HttpSolrCall.sendError(HttpSolrCall.java:618)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:477)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:499)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\n\tat org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: java.lang.OutOfMemoryError: Java heap space\n\tat org.apache.lucene.util.packed.Packed64.<init>(Packed64.java:73)\n\tat org.apache.lucene.util.packed.PackedInts.getMutable(PackedInts.java:1009)\n\tat org.apache.lucene.util.packed.PackedInts.getMutable(PackedInts.java:976)\n\tat org.apache.lucene.util.packed.GrowableWriter.<init>(GrowableWriter.java:46)\n\tat org.apache.lucene.util.packed.PagedGrowableWriter.newMutable(PagedGrowableWriter.java:58)\n\tat org.apache.lucene.util.packed.AbstractPagedMutable.fillPages(AbstractPagedMutable.java:60)\n\tat org.apache.lucene.util.packed.PagedGrowableWriter.<init>(PagedGrowableWriter.java:52)\n\tat org.apache.lucene.util.packed.PagedGrowableWriter.<init>(PagedGrowableWriter.java:45)\n\tat org.apache.lucene.util.fst.NodeHash.rehash(NodeHash.java:164)\n\tat org.apache.lucene.util.fst.NodeHash.add(NodeHash.java:133)\n\tat org.apache.lucene.util.fst.Builder.compileNode(Builder.java:215)\n\tat org.apache.lucene.util.fst.Builder.freezeTail(Builder.java:310)\n\tat org.apache.lucene.util.fst.Builder.add(Builder.java:417)\n\tat org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.build(AnalyzingSuggester.java:557)\n\tat org.apache.lucene.search.suggest.Lookup.build(Lookup.java:193)\n\tat org.apache.solr.spelling.suggest.SolrSuggester.build(SolrSuggester.java:162)\n\tat org.apache.solr.handler.component.SuggestComponent.prepare(SuggestComponent.java:179)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:251)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n",
"code": 500
  }
}

Solution

  • So I finally made it work by increasing the memory allocated to the Java Virtual Machine.

    In sunspot.yml :

    development:
      solr:
        hostname: localhost
        port: 8982
        log_level: INFO
        path: /solr/development
        memory: 6G   # => This allocate 6g RAM to the JVM
    

    It might have worked with a 4 go memory allocation, thought. I checked the build real-time and there was some memory usage peaks over 2, sometimes 3g.

    My suggester_fuzzy_dir now weight 1.3 go, which is more logical.