A PHP or Python script periodically fetches a large dataset of IP addresses (/32 netmask) from a remote database. In-between fetches the dataset will be temporarily stored in APC or Memcached key store.
The main job of the script is to check if a given ip-address exists in the database/cache (think: "blacklist").
What would be the most efficient (performance wise) way to:
What i have come up with so far:
Alternative 1 Store all IP addresses as a large array-list as the value of a single key in APC, then do a
if (in_array("", $ip_list_from_cache))
Alternative 2 Store each IP as key-name in APC, then do a
if (apc_exists('')
This is a large list and i want the compare check to be very fast.
Thanks in advance for any comments!
The solution to performance dilemmas is usually to benchmark both solutions.
In this case though, I'd say the cache approach makes a lot more sense: the time complexity of in_array
is O(N)
, that is, a linear sweep. On the other hand, Caches are usually implemented as hash tables, where lookup is O(1)
Also, if you aggregate the records in Memcached, you'll avoid wasting a lot of RAM duplicating the list in memory once per web worker process.
It would also arguably be a much cleaner solution.
On a side note, did you consider doing this at another level? With some light scripting, you could do your checks at the LB (e.g. Nginx) level.