I have a small (5-node) cluster that I'm using to benchmark performance of DSE Search with various specific indexing strategies (I'm using DSE 4.5.2 on CentOS). When I'm only submitting a single query at a time, everything works fine, but when I ramp up to increased load, I quickly see netty exceptions. In SolrJ (most of my testing has been using a java client), these manifest as
Error with server [MY URL]: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: address already in use by: [id: 0x2d3873c4, local:E:2d3873c4 => local:local]
This seems to happen consistently whenever I have multiple threads concurrently querying the cluster, regardless of which node I'm actually pointed at and regardless of whether the requests originate from the same server. I've also been able to induce this behavior when using other clients (e.g. PySolr4), so this seems likely to be a DSE setup issue and not some weird problem with my test application.
Furthermore, if I change the inter-node communication type in dse.yaml from netty
to http
, the same cluster will happily respond to clients submitting hundreds of concurrent requests. I had initially left the default netty* settings in dse.yaml, but after encountering the error, I played around with a few options (max_connections, acceptor/worker threads etc); unfortunately, I haven't found a lot of documentation around these values and nothing I changed seemed to have the desired effect.
Conceptually, it seems like using netty would provide much better performance than http, so I'd like to try to figure out what may be going wrong here. Thanks in advance for any advice.
See below a stack trace from /var/log/cassandra/system.log
corresponding to one of these errors:
ERROR [http-8983-exec-4] 2014-11-19 18:35:51,557 SolrDispatchFilter.java (line 696) Error request exception: address already in use by: [id: 0x17617168, local:E:17617168 => local:local]
java.lang.RuntimeException: address already in use by: [id: 0x17617168, local:E:17617168 => local:local]
at com.datastax.bdp.search.solr.handler.shard.netty.ShardClient.tryNewConnection(ShardClient.java:204)
at com.datastax.bdp.search.solr.handler.shard.netty.ShardClient.sendTo(ShardClient.java:163)
at com.datastax.bdp.search.solr.handler.shard.netty.NettyShardHandler.submit(NettyShardHandler.java:76)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:287)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:137)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1889)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:723)
at com.datastax.bdp.search.solr.servlet.CassandraDispatchFilter.execute(CassandraDispatchFilter.java:185)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at com.datastax.bdp.search.solr.servlet.CassandraDispatchFilter.doFilter(CassandraDispatchFilter.java:147)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.datastax.bdp.cassandra.audit.SolrHttpAuditLogFilter.doFilter(SolrHttpAuditLogFilter.java:218)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.datastax.bdp.search.solr.auth.CassandraAuthorizationFilter.doFilter(CassandraAuthorizationFilter.java:100)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.datastax.bdp.search.solr.auth.DseAuthenticationFilter.doFilter(DseAuthenticationFilter.java:102)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:891)
at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:750)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2283)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: io.netty.channel.ChannelException: address already in use by: [id: 0x17617168, local:E:17617168 => local:local]
at io.netty.channel.local.LocalChannelRegistry.register(LocalChannelRegistry.java:46)
at io.netty.channel.local.LocalChannel.doBind(LocalChannel.java:178)
at io.netty.channel.local.LocalChannel$LocalUnsafe.connect(LocalChannel.java:349)
at io.netty.channel.DefaultChannelPipeline$HeadHandler.connect(DefaultChannelPipeline.java:1008)
at io.netty.channel.DefaultChannelHandlerContext.invokeConnect(DefaultChannelHandlerContext.java:495)
at io.netty.channel.DefaultChannelHandlerContext.connect(DefaultChannelHandlerContext.java:480)
at io.netty.channel.DefaultChannelHandlerContext.connect(DefaultChannelHandlerContext.java:465)
at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:847)
at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:199)
at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:165)
at io.netty.channel.local.LocalEventLoop.run(LocalEventLoop.java:33)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
... 1 more
Seems to be a Netty/JVM bug, related to: https://github.com/netty/netty/issues/1765. Could you try another OS and/or JVM version?