I have 40 lacks IP ranges like below image, need to find IP details with that IP ranges.
How to efficiently do it within 5ms time. Which DB need to use to store data and to query?
Tried with the following ways.
I have tried with the following solution, but it will efficient with an in-memory array, I need to find in 40 lacks rows, So, it won't be efficient.
Also tried with MongoDB, Stored all 40 lacks rows in mongo collection, used $gte
and $lte
query. But it's response time is more than 150ms with local mongo server. For me, response time should be less than 5ms.
This data modeling pattern is used by many Aerospike users. The efficient way to do it is to have your from and to columns ip addresses in 32 bit integer format, and from and to have common first 24 bits.
for eg: from: 1.0.0.0 to: 1.0.0.255 - store as 32 bit ints. You lookup this record with its primary key which you will set as the common 24 bit value. i.e. 1.0.0 as the 24 bit integer. So if you want to lookup 1.0.0.21 ... you just go ahead and lookup the data in "1.0.0" primary key.
On Aerospike, this kind of read can be at sub-millisecond performance.
In your case, your ranges are not consistent. So, you have two choices - 1 - recreate your data in the 24 bit common format as I mention above, this will cause some of rows to become multiple rows but the lookup will be extremely fast. For example, from: 1.0.0.0 to: 1.0.1.255 will be separated with same rest of the data into 1.0.0.0 - 1.0.0.255 and 1.0.1.0 to 1.0.1.255. This means higher total number of records which is not an issue for Aerospike - you can easily store billions of records without affecting read latency of individual records.
Or you can use secondary index query on "from" - your ip > from, and add a predicate filter expression where your ip is < to, again your ip, from, to all in 32 bit int format which will return exactly one record back, if found. See https://www.aerospike.com/docs/guide/predicate.html This will be a bit slower than the first method I described but could still be under 5ms - you will have to test and see.
11/21/19 update - there is another way which may be easier. Use the to field as a 32 bit integer as the primary key for the record. Create another record, in-memory namespace, as a sorted list of just all "to" entries - 400,000 in your case - which will fit in one record. Use LIST type, relative indexing to search by value and return relative index 0, which will return the correct to value, which is also your primary key. You can then retrieve the entire record with it. Two reads, should be well under 5ms.