Search code examples
memcachedhashtableapc

How to make APC cache based on distributed hash tables(like memcache)?


I've read an article about Distributed Hash Tables and seems it's possible to implement such a thing like memcache with APC. As you know APC is much more faster than memcache if we're fetching keys from a single server. So if we make APC distributed we have both performance and distribution. I need some thoughts to start it. Could someone who is familiar with Hash tables explain how to do that? How to make APC like memcache?
If you know something about keyspace partitioning and Overlay network that would be much more better.


Solution

  • Although at the surface both softwares provide a comparable service, their underpinnings are entirely different, and that explains the dramatic difference in performance.

    APC is basically a system that allows you to store objects (be it user objects or parsed opcode chunks) in shared memory. Shared memory, in all systems I know, is as fast as local RAM once you obtained a pointer to it.

    So, in short, what APC has to do to write or read an object is:

    • request shm access and obtain a pointer to it
    • calculate object offset and size in the shm
    • memcpy that memory zone into a buffer or vice versa
    • done

    Simple, and taking into account that memory bandwidth nowadays is 10's of gigabytes per second, quick.

    Due to its distributed nature in a memcache scenario more needs to be done:

    • client encodes and transmits request
    • server receives and decodes request
    • server calculates object offset and size in memcached's memory
    • server memcpy's that memory zone into a buffer or vice versa
    • server transmit buffer
    • client receives and decodes buffer

    Now, if we want to distribute APC, the client and server will need to talk to each other. And all of a sudden we find ourselves in a scenario that, with the exception of a few less important details, is identical to the one used by memcache. And all the expensive operations will become necessary again, ie all the copying around, sending through the network stack included.

    That's also an explanation why even with a memcache instance running on localhost, without horribly slow gigabit ethernet between the nodes, there is a considerable overhead in what needs to be done to make a distributed system work.

    And that's why I'm convinced you're looking at the wrong suspect here, make APC distributed and it will be in the same performance/throughput category.