Key groups with APC cache

APC lets you store data inside keys, but you cannot group these keys.

So if i want to have a group called "articles", and inside this group I would have keys that take the form of the article ID I can't do this easily.

articles -> 5   -> cached data
         -> 10  -> cached data
         -> 17  -> cached data

         ...

I could prefix the key with the "group" name like:

article_5   -> cached data
article_10  -> cached data
article_17  -> cached data

 ...

But this it makes it impossible to delete the entire group if I want to :(

A working solution would be to store multidimensional arrays (this is what I'm doing now), but I don't think it's good because when I want to access / or delete cached data, I need to get the entire group first. So if the group has one zillion articles in it you can image what kind of array I will be iterating and searching

Do you have better ideas on how could I achieve the group thing?

edit: found another solution, not sure if it's much better because I don't know how reliable is yet. I'm adding a special key called __paths which is basically a multidimensional array containing the full prefixed key paths for all the other entries in the cache. And when I request or delete the cache I use this array as a reference to quickly find out the key (or group of keys) I need to remove, so I don't have to store arrays and iterate trough all keys...

Solution

Based upon your observations, I looked at the underlying C implementation of APC's caching model (apc_cache.c) to see what I could find.

The source corroborates your observations that no grouping structure exists in the backing data store, such that any loosely-grouped collection of objects will need to be done based on some namespace constraint or a modification to the cache layer itself. I'd hoped to find some backdoor relying on key chaining by way of a linked list, but unfortunately it seems collisions are reconciled by way of a direct reallocation of the colliding slot instead of chaining.

Further confounding this problem, APC appears to use an explicit cache model for user entries, preventing them from aging off. So, the solution Emil Vikström provided that relies on the LRU model of memcached will, unfortunately, not work.

Without modifying the source code of APC itself, here's what I would do:

Define a namespace constraint that your entries conform to. As you've originally defined above, this would be something like article_ prepended to each of your entries.
Define a separate list of elements in this set. Effectively, this would be the 5, 10, and 17 scheme you'd described above, but in this case, you could use some numeric type to make this more efficient than storing a whole lot of string values.
Define an interface to updating this set of pointers and reconciling them with the backing memory cache, including (at minimum) the methods insert, delete, and clear. When clear is called, walk each of your pointers, reconstruct the key you used in the backing data store, and flush each from your cache.

What I'm advocating for here is a well-defined object that performs the operations you seek efficiently. This scales linearly with the number of entries in your sub-cache, but because you're using a numeric type for each element, you'd need over 100 million entries or so before you started to experience real memory pain at a constraint of, for example, a few hundred megabytes.

Tamas Imrei beat me to suggesting an alternate strategy I was already in the process of documenting, but this has some major flaws I'd like to discuss.

As defined in the backing C code, APCIterator is a linear time operation over the full data set when performing searches (using its constructor, public __construct ( string $cache [, mixed $search = null ...]] )).

This is flatly undesirable in the case where the backing elements you're searching for represent a small percentage of your total data, because it would walk every single element in your cache to find the ones you desire. Citing apc_cache.c:

/* {{{ apc_cache_user_find */
apc_cache_entry_t* apc_cache_user_find(apc_cache_t* cache, char *strkey, \
  int keylen, time_t t TSRMLS_DC)
{
    slot_t** slot;
    ...
    slot = &cache->slots[h % cache->num_slots];
    while (*slot) {
        ...
        slot = &(*slot)->next;
    }
}

Therefore, I would most strongly recommend using an efficient, pointer-based virtual grouping solution to your problem as I've sketched out above. Although, in the case where you're severely memory-restricted, the iterator approach may be most correct to conserve as much memory as possible at the expense of computation.

Best of luck with your application.