I am having a large data set (number based data, for example, 200,000 rows of numbers) in django database, and the client will pass in another set of data, for example 100-500 pieces of number based data, then the server needs to find out what numbers are already in the database from the data passed in. Let's say the number data are phone numbers. If I just do the regular number comparison, the server cant even handle 2-3 requests from clients.
Please suggest me some solution for my problem.
Are the numbers unique? Are they keyed?
SELECT num FROM table WHERE num IN (111,222,333,....500 numbers later..., 99999)
Should give you a list of numbers that are in the db, you take that list, compare it against your set and take the difference.
Most SQL DBs will take a SQL statement that size, it's actually quite performant, and, if you're only interested in actual existence, then the DB will likely simply scan the index and never hit the actual rows (depends on the DB of course).
So, try that and see how it works. If your numbers aren't indexed, then you're doomed at the gate -- fix that too.
Addenda:
Simply, if your number is unique, you need to ensure that you have an index on that number's columns in your database. If you want to enforce that it remains unique, you can make it a unique index, but that's now required:
CREATE UNIQUE INDEX i1 ON table(num)
If you don't have the index, the db will continually scan all of the rows of the table, which is not what you want.
And, yes, the 111,222,333 are the numbers passed from the clients that you're checking for.
Lets say that you had the numbers 1,2,3,4,5,6 in your database, and the list of the client is 1,5,7. When you execute the SELECT num FROM table WHERE num IN (1,5,7) you will get back 2 rows: 1 and 5.
So, you'll need to compare the result number, 1,5 to you list, 1,5,7. I don't know enough Python, much less Django, to give you a good example, but a quick glance shows that they have 'set' objects. With these you could do:
newSet = clientSet.difference(dbSet)
where clientSet is the set of numbers from the client, dbSet is the set of numbers from the query given, and newSet is the list of numbers that the client has that are not in the db.
I may be misusing the set operator 'difference', but that's the gist of it.