Setup redis and nutcracker on CentOS 6.4. and trying to connect using ServiceStack.Redis client. Found major performance issue.
For testing left only 1 redis instance
beta:
listen: 0.0.0.0:22122
hash: fnv1a_64
distribution: ketama
auto_eject_hosts: true
#timeout: 5000
#server_retry_timeout: 2000
#server_failure_limit: 3
redis: true
servers:
#- 127.0.0.1:6379:1
- 127.0.0.1:6380:1
In the following unit test I'm trying to send 100k strings to redis via nutcracker.
[TestClass]
public class RedisProxyTest
{
public string host = "192.168.56.112";
//public int port = 6379;
public int port = 22122;
[TestMethod]
public void TestMethod1()
{
var key = "l2";
var count = 100000;
using (var redisClient = new RedisClient(host, port))
{
var list = new List<string>();
for (int i = 0; i < count; i++)
{
list.Add(Guid.NewGuid().ToString());
}
Utils.TimeLog("Remove", () => redisClient.Remove(key));
Utils.TimeLog("AddRangeToList", () => redisClient.AddRangeToList(key, list));
}
using (var redisClient = new RedisClient(host, port))
{
redisClient.GetListCount(key);
Utils.TimeLog("GetRangeFromList", () =>
{
var ret = redisClient.GetRangeFromList(key, count / 2, count - 1);
Console.WriteLine(ret.Count);
});
}
}
}
On first few runs after nutcracker restarted AddRangeToList works with 1-2 sec. But with subsequent runs AddRangeToList performance drops significantly from few minutes even more than 20 mins (if no timeout configured). I cannot reproduce same when using redis directly. I didn't try any other client yet. Any ideas why?
This what I see in console after unit test run:
Test Name: TestMethod1
Test Outcome: Passed
Remove: 0.0331171
AddRangeToList: 806.8219166
50000
GetRangeFromList: 1.741737
If nutcracker is proxing several tens of thousands of connections or sending multi-get request with several thousands of keys, you should use mbuf size of 512
The following link talks about how to interpret mbuf size? - https://github.com/twitter/twemproxy/issues/141
Every client connection consumes at least one mbuf. To service a request we need two connections (one from client to proxy and another from proxy to server). So we would need two mbufs.
A fragmentable request like 'get foo bar\r\n', which btw gets fragmented to 'get foo\r\n' and 'get bar\r\n' would consume two mbuf for request and two mbuf for response. So a fragmentable request with N fragments needs N * 2 mbufs
The good thing about mbuf is that the memory comes from a reuse pool. Once a mbuf is allocated, it is never freed but just put back into the reuse pool. The bad thing is that once mbuf is allocated it is never freed, since a freed mbuf always goes back to the reuse pool - https://github.com/twitter/twemproxy/blob/master/src/nc_mbuf.c#L23-L24 (this can be fixed by putting a threshold parameter on the reuse pool)
So, if nutcracker is handling say 1K client connections and 100 server connections, it would consume (max(1000, 100) * 2 * mbuf-size) memory for mbuf. If we assume that clients are sending non-pipelined request, then with default mbuf-size of 16K this would in total consume 32M.
Furthermore, if on average every requests has 10 fragments, then the memory consumption would be 320M. Instead of handling 1K client connections, lets say you were handling 10K, then the memory consumption would be 3.2G. Now instead of using a default mbuf-size of 16K, you used 512 bytes, then memory consumption for the same scenario would drop to 1000 * 2 * 512 * 10 = 10M
This is the reason why for 'large number' of connection you want to choose a small value for mbuf-size like 512