Search code examples
c#mysql.data

Insert 100.000 entries to MySQL with tasks - C#


I am having some troubles with my code. I am trying to insert about 100.000 items in a Mysql table using Tasks to speed up the process. Here is my code:

List < Task > tasks = new List < Task > ();
int counter = 0;
foreach(var folder in dirs) {
    object lockTarget = new object();

    tasks.Add(Task.Factory.StartNew(() => {
        try {
                comm.CommandText = "INSERT INTO webdata(url,title) VALUES(?sUrl,?sTitle,) ON DUPLICATE KEY UPDATE url=?sUrl,title=?sTitle";
                comm.Parameters.Add("?sUrl", MySql.Data.MySqlClient.MySqlDbType.VarChar).Value = url;
                comm.Parameters.Add("?sTitle", MySql.Data.MySqlClient.MySqlDbType.VarChar).Value = title;

                var mysql_return = comm.ExecuteNonQueryAsync();

                lock(lockTarget) {
                    counter++;
                    Console.WriteLine("\rProcessing {0} of {1} - {2} {3}", counter, dirs.Length, folder, mysql_return.Status);
                }

        } catch (MySql.Data.MySqlClient.MySqlException ex) {
            Console.WriteLine(ex.Message);
        }    
    }));
}

Task.Factory.StartNew(() => {
    Task.WaitAll(tasks.ToArray());
    Console.WriteLine("Finished");
    conn.Close();
});

So, when i run my code, some of the Tasks (mysql_return) are returning rantocompletion and some Faulted, thus only 1/2 of my entries actually inserted to the DB.

I could use ExecuteNonQuery and no Task but it will take ages to insert so many data. Is the problem due to hardware(CPU can't keep up with thousands of spawned tasks) or due to my code?

Any ideas? Thanks in advance


Solution

  • Problem1: comm is modified in multiple threads it seems, that has to be a local variable.

    Problem2: You're not waiting for the task returned by ExecuteNonQueryAsync and thus Task returned by StartNew will not wait for the completion of the ExecuteNonQueryAsync to be finished.

    Eventually you close the connection assuming that all are done but they aren't. They all are started, but not completed.

    You need to use await the result of ExecuteNonQueryAsync(for that use async lambda) and also call Task.UnWrap or use Task.Run which gives you UnWrap for free.

    So your code will become something like the following:

    tasks.Add(Task.Run(async () => //Note Task.Run and async lambda
     {
          try
          {
            var comm = new WhateverCommand();
            ...
    
            var mysql_return = await comm.ExecuteNonQueryAsync();//Note the await
            ...
          }
          catch (MySql.Data.MySqlClient.MySqlException ex) 
          {
            Console.WriteLine(ex.Message);
          }    
    }));