Search code examples
sqlitef#microsoft.data.sqlite

SQLite batch insert not speeding up with Transactions


I am trying to make a helper function to make inserts to a SQLite database easy to generalize.

For example:

  member __.RegisterUsers (users : UserData array) =
    insertOrReplace "User" users [|
      ("id", fun u -> u.id)
      ("nick", fun u -> u.nick)
      ("is_verified", fun u -> u.is_verified)
      ("is_banned", fun u -> u.is_banned)
      ("subscriptions", fun u -> u.num.subscriptions)
      ("subscribers", fun u -> u.num.subscribers)
      ("total_posts", fun u -> u.total_posts)
      ("original_nick", fun u -> u.original_nick)
    |]

My implementation is as follows:

  let insertOrReplace (table:string) (items:'T array) (args:(string * ('T -> obj)) array)=
    let fieldNames = args |> Seq.map fst |> Seq.toArray
    let parameterNames = fieldNames |> Array.mapi (fun i name -> $"@{name}{i}")

    let commandText =
      StringBuilder()
        .Append($"INSERT OR REPLACE INTO {table} (")
        .AppendJoin(", ", fieldNames)
        .Append(") VALUES (")
        .AppendJoin(", ", parameterNames)
        .Append(");")
        .ToString()

    use transaction = connection.BeginTransaction()
    use command = new SqliteCommand(commandText, connection, transaction)

    let parameters =
      parameterNames |> Array.map (fun name ->
        let parameter = command.CreateParameter()
        parameter.ParameterName <- name
        command.Parameters.Add(parameter)
      )
      
    try
      let modifiedFields = items |> Array.map (fun item -> 
          Array.iter2 (fun (name, selector) (parameter:SqliteParameter) -> parameter.Value <- selector item) args parameters
          command.ExecuteNonQuery()
      )
      transaction.Commit()
      modifiedFields |> Array.sum |> Ok
    with
    | _ as ex -> Error ex.Message

This uses Microsoft's Microsoft.Data.Sqlite library

The table definition I'm trying to insert to:

CREATE TABLE IF NOT EXISTS User (
    id TEXT PRIMARY KEY,
    nick TEXT NOT NULL,
    is_verified INTEGER NOT NULL,
    is_banned INTEGER NOT NULL,
    subscriptions INTEGER NOT NULL,
    subscribers INTEGER NOT NULL,
    total_posts INTEGER NOT NULL,
    original_nick TEXT NOT NULL
);

This "works" in the sense that the data is inserted correctly, but it's extremely slow and takes about 75-110 ms per record. For a single record, that's negligible, but I'm trying to insert hundreds at a time and it's becoming a serious bottleneck.

Things I've tried:

  • Batching multiple inserts in each execution. This did not affect the throughput at all
  • Reusing parameters. No change
  • Transactions. Every source I've seen online points to transactions as the solution to this problem, but adding them in did nothing. I tried directly copying Microsoft's bulk insert guide, but to no avail

Solution

  • The problem was not with any of the code shown here, but with the several other tables that all had foreign key references to the User table. There were several hundred thousand records with foreign key references to Users, and apparently checking foreign key integrity for that many was just too much.

    Data integrity is not a huge concern for this project and missing key references are not a problem, so I decided to remove all foreign key constraints. I left primary key constraints in place, as that is important for my use case.

    After rebuilding the database with no foreign keys, insert times are down to about 10-30ms per batch as opposed to the 10-30+ seconds previously.

    For anyone in the future with a similar problem where dropping foreign key constraints is acceptable for their project, I found this guide (archive).