sql-server-2008 t-sql large-data-volumes

Inner join and Split on large volume of data

We are working on large volume data (row counts given below) :

Table 1 : 708408568 rows  -- 708 million
Table 2 : 1416817136 rows -- 1.4 billion

Table 1 Schema:
----------------
ID -      Int PK
column2 - Int

Table 2 Schema
----------------
Table1ID - Int FK
SomeColumn - Int
SomeColumn - Int

Table1 has PK1 which servers as FK for Table 2.

Index details :

Table1 : 
PK Clustered Index on Id
Non Clustered (Non Unique) on column2

Table 2 :
Table1ID (FK) Clustered Index

Below is the query which needs to be executed :

SELECT t1.[id]
      ,t1.[column2]
FROM  Table1 t1
inner join Table2 t2
    on s.id = cs.id
WHERE t1.[column2] in (select [id] from ConvertCsvToTable('1,2,3,4,5.......10000')) -- 10,000 Comma seperated Ids

So to summarize, The inner join on ID should be handled by the clustered index on the same Ids on both PK and FK. and as for the "huge" Where condition on column2 we have a nonclustered index.

However, the query is taking 4 minutes for a small subset of 100 Ids, we need to pass 10,000 ids.

Is there a better way design wise that we can do this, or possibly does Table Partitioning help?

Just wanted to get some ways of how to solve huge volume Select with Inner Join and Where IN.

Note : ConvertCsvToTable is a Split function which has already been determined to perform optimally.

Thanks !

Solution

This is what I would try: Create a temp table with the structure of the return from the function. Make sure to set the column ID as primary key so that the optimizer takes it into consideration...

CREATE TABLE #temp
(id    int          not null
    ...
,PRIMARY KEY (id) )

then call the function

insert into #temp exec ConvertCsvToTable('1,2,3,4,5.......10000')

then use the temp table directly joined in the query

SELECT t1.[id], t1.[column2]
FROM  Table1 t1, t2, #temp
where t1.id = t2.id
  and t1.[column2] = #temp.id