We are working on large volume data (row counts given below) :
Table 1 : 708408568 rows -- 708 million
Table 2 : 1416817136 rows -- 1.4 billion
Table 1 Schema:
----------------
ID - Int PK
column2 - Int
Table 2 Schema
----------------
Table1ID - Int FK
SomeColumn - Int
SomeColumn - Int
Table1 has PK1 which servers as FK for Table 2.
Table1 :
PK Clustered Index on Id
Non Clustered (Non Unique) on column2
Table 2 :
Table1ID (FK) Clustered Index
Below is the query which needs to be executed :
SELECT t1.[id]
,t1.[column2]
FROM Table1 t1
inner join Table2 t2
on s.id = cs.id
WHERE t1.[column2] in (select [id] from ConvertCsvToTable('1,2,3,4,5.......10000')) -- 10,000 Comma seperated Ids
So to summarize, The inner join on ID should be handled by the clustered index on the same Ids on both PK and FK. and as for the "huge" Where condition on column2 we have a nonclustered index.
However, the query is taking 4 minutes for a small subset of 100 Ids, we need to pass 10,000 ids.
Is there a better way design wise that we can do this, or possibly does Table Partitioning help?
Just wanted to get some ways of how to solve huge volume Select with Inner Join and Where IN.
Note : ConvertCsvToTable is a Split function which has already been determined to perform optimally.
Thanks !
This is what I would try: Create a temp table with the structure of the return from the function. Make sure to set the column ID as primary key so that the optimizer takes it into consideration...
CREATE TABLE #temp
(id int not null
...
,PRIMARY KEY (id) )
then call the function
insert into #temp exec ConvertCsvToTable('1,2,3,4,5.......10000')
then use the temp table directly joined in the query
SELECT t1.[id], t1.[column2]
FROM Table1 t1, t2, #temp
where t1.id = t2.id
and t1.[column2] = #temp.id