Search code examples
sqlsqlperformance

Very huge IN clause in SQL expression


Let's suppose i have a product table with 1 million records (yes, i have 1 million products !).

This product table has a reference string field.

Some people wants to export a csv file of all product excepting some products.

We know the references of the product to exclude. They are stored in a csv file.

Here is the query i've built:

SELECT ... FROM products WHERE reference NOT IN ('ref1','ref2','.....')

Everything works fine with less than 100 references to exclude.

What should i do if i have more than 10000 references to exclude ? The sql query is very big.

I have tried with mysql, postgre and sql server. It's the same problem.

Thanks


Solution

  • You can use table value parameters on SQL Server.

    The query would look like:

    SELECT *
    FROM products AS P
    LEFT JOIN @exludedProducts AS EP ON EP.reference = P.reference
    WHERE P.reference IS NULL
    

    You'll need to declare the table value type like (check the type to match your table):

    CREATE TYPE dbo.ProductReferenceTvp AS TABLE 
    (
        reference VARCHAR(10) NOT NULL
    )
    

    And your .Net code might look like:

    public void GetProducts(IEnumerable<string> excludedProducts)
    {
        StringBuilder sb = new StringBuilder();
                
        sb.AppendLine(" SELECT * ");
        sb.AppendLine(" FROM products AS P ");
        sb.AppendLine(" LEFT JOIN @exludedProducts AS EP ON EP.reference = P.reference ");
        sb.AppendLine(" WHERE P.reference IS NULL ");
                
        using (var cn = new SqlConnection(ConnectionString))
        {
            using (SqlCommand cmd = new SqlCommand(sb.ToString(), cn))
            {
                var pExcludedProducts = GetTvp("@exludedProducts", excludedProducts.Distinct());
                cmd.Parameters.Add(pExcludedProducts);
    
                DataSet ds = new DataSet();
                new SqlDataAdapter(cmd).Fill(ds);
                Print(ds);
            }
        }
    }
    
    private SqlParameter GetTvp(string name, IEnumerable<string> excludedProducts)
    {
        var dt = new DataTable();
        dt.Columns.Add("reference", typeof(String));
        foreach (var product in excludedProducts)
        {
            dt.Rows.Add(product);
        }
        var p = new SqlParameter(name, dt);
        p.SqlDbType = SqlDbType.Structured;
        p.TypeName = "dbo.ProductReferenceTvp";
        return p;
    }