I am writing a project related to mass data fetching. Currently I am using .NET Framework 4.8 and the Mysql package to start connection and insert data to the database server.
I am going to insert around 400, 000 line/second. I am concern that the SQL connection may become the bottleneck of my program. I would like to know if I create a multitthread connection with the sql and insert the data using a consumer queue, would it be faster and is it worth it(pros and cons)?
In my intuitive thought it would be faster but I am not sure how much performance it can provide with respect of the overhead for threads. I am not a SQL expert so it would be nice if someone could explain the pros and cons of opening multiple connections to a SQL on multiple threads.
Rumors, opinions, hearsay, facts, version-dependent benchmarks, some personal experience, etc...
Multiple threads will improve throughput, but with limits:
Batching:
LOAD DATA
is the fastest way to INSERT
lots of rows from a single thread at a single time. But if you include the cost of writing the file to LOAD
, that may make it effectively slower than batched inserting.INSERT
is a close second. But it caps out at "hundreds" of rows, when it hits either some limit or "diminishing returns".INSERT
query. So it (or LOAD DATA
) is worth using for high speed ingestion. (Source: many different timed tests.)Source of data:
What happens after loading the data? Surely this is not a write-only-never-read table.
PARTITIONing
is rarely useful, except for eventual purging of old data. See Partition