I'm working with the awesome DuckDB.NET library and C# for this example. I'm specifically working with the ADO.NET provider.
My database contains the following table:
CREATE TABLE tbl01 (
Id INTEGER,
TextListTest TEXT[],
DateTest DATE
);
In the program each record is encapsulated by a class:
class Record
{
public int Id { get; set; }
public List<string> TextListTest { get; set; };
public DateTime TextListTest { get; set; };
}
and is appended to a List<Record>
. This list gets
very large so I would like to avoid the per-row overhead of INSERT
statements in a
loop. The documentation says that if I absolutely must use
inserts in this manner I should also wrap them in calls of BEGIN TRANSACTION
and COMMIT
. I
really don't want to miss out on insert performance here. Is there another approach I can use with
the library I'm using?
I noticed in the DuckDB.NET
sample with the
LowLevelBindingsSample()
method that I could use prepared statements but I'm not sure that if that would confer any performance benefits.
Is there an approach I'm missing - perhaps the appender? If someone could provide an example using the 3 specific data
types in the above table that would be greatly appreciated (I'm having trouble figuring out the
LIST
column).
using DuckDB.NET.Data;
namespace DuckTest;
class Record
{
public int Id { get; set; }
public List<string> TextListTest { get; set; }
public DateTime DateTest { get; set; }
}
class Program
{
public static void Main(string[] args)
{
// pretend this is a really big list!
List<Record> recordList = new List<Record>
{
new Record { Id = 1, TextListTest = new List<string> { "Ball", "Horse" }, DateTest = new DateTime(1994, 12, 3) },
new Record { Id = 2, TextListTest = new List<string> { "Matthew", "Giorgi is cool!" }, DateTest = new DateTime(1998, 11, 28) },
new Record { Id = 3, TextListTest = new List<string> { "Red", "Black", "Purple" }, DateTest = new DateTime(1999, 9, 13) },
new Record { Id = 4, TextListTest = new List<string> { "Cat" }, DateTest = new DateTime(1990, 2, 5) },
};
using (var duckDBConnection = new DuckDBConnection("Data Source=db01.duckdb"))
{
duckDBConnection.Open();
var command = duckDBConnection.CreateCommand();
command.CommandText = "CREATE TABLE tbl01 ( Id INTEGER, TextListTest TEXT[], DateTest DATE );";
var executeNonQuery = command.ExecuteNonQuery();
// I could do this in a loop but there's probably a better way...
command.CommandText = "INSERT INTO tbl01 VALUES (1, ['Ball', 'Horse'], '1994-12-03');";
executeNonQuery = command.ExecuteNonQuery();
}
}
}
I would be willing to use the low level bindings library if needed.
The library has support for appenders so that should be the fastest way to import data.
Check out the Bulk Data Loading for more details and options.