Search code examples
sqlsql-servert-sqlsql-update

UPDATE a large set of rows based on a list of IDs in a file


I have a file with rather large list (160 000) of IDs (bigint) which are used for clustered index. For each of these I need to update two bit columns. My first trial was via C# script in LinqPad which translates into SQL. To illustrate what I am trying to achieve:

var reports = File.ReadAllText((@"C:\list.txt")).Split(new[] { "\n" }, StringSplitOptions.None);
reports
    .ToList()
    .ForEach(r => {
            var row = DatabaseRecords.First(u => u.UserUsageId == long.Parse(r));
            
            row.IsYummy = true;
            row.isJuicy = true;
            
            SubmitChanges();
    });

but it is far from what I need in terms of performance and I want do it in SQL itself, but can't come up with a more performant way to do so.

I was thinking about somehow filling a temporary table and then perform a select from it within an update clause?

EDIT: Here is the execution plan:enter image description here


Solution

  • You can perform mostly type-safe bulk SQL database updates with expressive LINQ via linq2db. Note that linq2db works with most of the "major" database engines, so the code should be quite portable. Currently the example below has been tested on SQLServer 2022 Dev.

    A) Declare your Tables (Models)

    If you are using EntityFrameworkCore you can reuse your EF models (linq2db.EntityFrameworkCore). Note that the DataType attribute has some SQLServer specific config. The rest of the attributes are engine agnostic.

    [Table("Food")]
    public class Food
    {
        [PrimaryKey, Identity] public long Id { get; set; }
    
        [Column, NotNull, DataType(DataType.VarChar, DbType = "VARCHAR(32)")]
        public string Name { get; set; }
    
        [Column, NotNull] public bool IsYummy { get; set; }
        [Column, NotNull] public bool IsJuicy { get; set; }
    }
    
    public class TempTable
    {
        // Adds a constraint on the TempTable which will prevent duplicate Ids
        // (remove if duplicates are allowed)
        [PrimaryKey] public long Id { get; set; }
    }
    

    B) File Read into Temp Table then SQL update with Inner Join

    See "Full Source Code Example" below for "LINQ query syntax"

    using (var db = new DataConnection(options))
    {
        using var tempTable = db.CreateTempTable<TempTable>("#FoodUpdate");
        var foodTable = db.GetTable<Food>();
    
        // ------------- Insert Into TempTable -------------
        var lines = File
            .ReadAllLines(listTxtPath)
            .Select(long.Parse)
            .Select(id => new TempTable { Id = id });
        
        tempTable.BulkCopy(lines); // <-- inserting into db happens here
        // -------------------------------------------------
    
        // ------------- Update from TempTable -------------
        var count1 = foodTable
            .InnerJoin(tempTable, (f, t) => f.Id == t.Id, (f, t) => f)
            .Set(f => f.IsYummy, true)
            .Set(f => f.IsJuicy, true)
            .Update(); // <-- update statement is generated and run
        // -------------------------------------------------
    }
    

    Performance

    These are just ball park numbers, using a consumer grade CPU, RAM, SSD PC (.NET 8). The text file is ~3.3MiB contains ~400_000 lines and the database table contains 2_000_000 records. The total time taken for the entire process reading the text file and updating the database table is ~1s. This process is fairly efficient given it's using BulkCopy and a single SQL UPDATE.

    [Test 01] Test Completed, 400314/2000000 records updated in 1246ms
    [Test 02] Test Completed, 399219/2000000 records updated in 1058ms
    [Test 03] Test Completed, 400494/2000000 records updated in 1037ms
    [Test 04] Test Completed, 400599/2000000 records updated in 1078ms
    [Test 05] Test Completed, 400587/2000000 records updated in 1034ms
    

    The linq2db generated SQL is as follows

    DECLARE @IsYummy Bit -- Boolean
    SET     @IsYummy = 1
    DECLARE @IsJuicy Bit -- Boolean
    SET     @IsJuicy = 1
    
    UPDATE
        [f]
    SET
        [f].[IsYummy] = @IsYummy,
        [f].[IsJuicy] = @IsJuicy
    FROM
        [Food] [f]
            INNER JOIN [tempdb]..[#FoodUpdate] [t] ON [f].[Id] = [t].[Id]
    

    Full Source Code Example

    Entire program source used to test

    using LinqToDB.Data;
    using LinqToDB;
    using LinqToDB.Mapping;
    using System.IO;
    using System.Linq;
    using System.Diagnostics;
    using System;
    using System.Text;
    
    namespace TextToDbUpdate;
    
    internal class Program
    {
        static void Main(string[] args)
        {
            var options = new DataOptions()
                .UseSqlServer(@"Server=.\;Database=FoodStuff;Trusted_Connection=True;TrustServerCertificate=True")
                .UseTracing(TraceLevel.Info, ti => Debug // print out the generated queries
                    .WriteLineIf(ti.TraceInfoStep == TraceInfoStep.BeforeExecute, ti.SqlText));
    
            var listTxtPath = @"C:\Temp\list.txt";
            var recordCount = 2_000_000;
    
            for (int i = 1; i <= 5; i++)
            {
                Console.WriteLine($"[Test {i:00}] Test Setup (Dropping / Truncating / Creating / Populating Tables)");
                TestingSetup.Setup(options, listTxtPath, recordCount, 0.20);
                Console.WriteLine($"[Test {i:00}] Test Setup Completed. Starting Test...");
    
                var sw = Stopwatch.StartNew();
                var count = Test(options, listTxtPath);
                Console.Write($"[Test {i:00}] Test Completed, {count}/{recordCount} records updated in {sw.ElapsedMilliseconds}ms\n");
            }
        }
    
        public static int Test(DataOptions options, string listTxtPath)
        {
            using (var db = new DataConnection(options))
            {
                using var tempTable = db.CreateTempTable<TempTable>("#FoodUpdate");
                var foodTable = db.GetTable<Food>();
    
                // ------------- Insert Into TempTable -------------
                var lines = File
                    .ReadAllLines(listTxtPath)
                    .Select(long.Parse)
                    .Select(id => new TempTable { Id = id });
                
                tempTable.BulkCopy(lines); // <-- inserting into db happens here
                // -------------------------------------------------
    
                // ------------- Update from TempTable -------------
                var count1 = foodTable
                    .InnerJoin(tempTable, (f, t) => f.Id == t.Id, (f, t) => f)
                    .Set(f => f.IsYummy, true)
                    .Set(f => f.IsJuicy, true)
                    .Update(); // <-- update statement is generated and run
    
                // using "query" syntax
                // var count2 =
                //    (from food in foodTable
                //     from temp in tempTable.InnerJoin(t => t.Id == food.Id)
                //     select food)
                //    .Set(f => f.IsYummy, true)
                //    .Set(f => f.IsJuicy, true)
                //    .Update();
    
                // -------------------------------------------------
    
                return count1;
            }
        }
    }
    
    [Table("Food")]
    public class Food
    {
        [PrimaryKey, Identity] public long Id { get; set; }
    
        [Column, NotNull, DataType(DataType.VarChar, DbType = "VARCHAR(32)")]
        public string Name { get; set; }
    
        [Column, NotNull] public bool IsYummy { get; set; }
        [Column, NotNull] public bool IsJuicy { get; set; }
    }
    
    public class TempTable
    {
        // Adds a constraint on the TempTable which will prevent duplicate Ids
        // (remove if duplicates are allowed)
        [PrimaryKey] public long Id { get; set; }
    }
    
    public class TestingSetup
    {
        public static void Setup(DataOptions options, string listTxtPath, int recordCount, double updateProportion)
        {
            var random = new Random();
    
            // create the text file
            var insertLines = Enumerable
                .Range(1, recordCount)
                .Where(s => random.NextDouble() < updateProportion)
                .Select(n => n.ToString());
            File.WriteAllLines(listTxtPath, insertLines);
    
            // drop and recreate the table
            var data = Enumerable
                .Range(1, recordCount)
                .Select(_ => new Food
                {
                    Name = RandomString(random, 18),
                    IsYummy = random.NextDouble() < 0.1,
                    IsJuicy = random.NextDouble() < 0.1
                });
    
            using (var db = new DataConnection(options))
            {
                throw new Exception("Remove this line if you know what `db.DropTable<Food>()` means");
                try { db.DropTable<Food>(); } catch { }
                var t = db.CreateTable<Food>();
                t.BulkCopy(data);
            }
    
            static string RandomString(Random random, int capacity)
            {
                var sb = new StringBuilder(capacity);
                for (int i = 0; i < capacity; i++)
                    sb.Append((char)random.Next((int)'A', (int)'Z'));
                return sb.ToString();
            }
        }
    }