Search code examples
c#mysqltransferrate

C# - SQL - Reading every row in a database, datamining the row, then saving results in another database - How to increase speed


My problem: Code works great, but the speed is far too slow for the number of rows it needs to process.

What I am doing: I initiate a COUNT(*) to pull total number of rows (which last night was ~58000) and I use that to create the loop to do the following: pull two columns of data from that row, datamine one column for text patterns.

Once I had completed that, I search a second table to see if that individual by username is already existing - if they exist, I update their row. If not, I add a new one.

There are 44 columns of data, one being the name the other 43 storing values of my datamining results.

In about 8 hours, it has completed 26500 out of the 58000 when it first started (in that same period, the table has grown to ~100000, but I am not worried about that).

Is there a better method to increase read/write rate?

Portion of my code - I have removed many of the int declarations and Regex.Matches as they are copies of the first with altered Match values.

azCheck is to determine if the message even contains anything we are looking for, it is remains '0', then we don't bother with the last portion of the code.

using (new MySqlConnection(ConnectiongString))
            {
                using (MySqlCommand cmd = new MySqlCommand("select count(*) from messages", connection))
                {
                    using (MySqlDataReader reader = cmd.ExecuteReader())
                    {
                        StringBuilder sb = new StringBuilder();
                        while (reader.Read())
                        {
                            sb.Append(reader.GetInt32(0).ToString());
                        }
                        total_messages = int.Parse(sb.ToString());
                    }
                }
            }
            Console.WriteLine(total_messages.ToString());
            connection.Close();
            for (int i = 1; i <= total_messages; i++)
            {
                connection.Open();
                using (new MySqlConnection(ConnectiongString))
                {
                    using (MySqlCommand cmd = new MySqlCommand("select * from messages WHERE id="+i+"", connection))
                    {
                        using (MySqlDataReader reader = cmd.ExecuteReader())
                        {
                            StringBuilder sb = new StringBuilder();
                            while (reader.Read())
                            {
                                username = reader["username"].ToString();
                                message = reader["message"].ToString();
                            }
                        }
                    }
                }
                connection.Close();

                Console.Write("\r{0}   ", i);

                int aiCount = 0;
                aiCount += Regex.Matches(message, "ai", RegexOptions.IgnoreCase).Count;
                azCheck += aiCount;

//There are ~42 of the regex.matches after the first one.

MySqlCommand cmd1 = connection.CreateCommand();
                connection.Open();
                cmd1.CommandText = "SELECT username FROM users";
                cmd1.CommandType = CommandType.Text;
                cmd1.Connection = connection;
                MySqlDataReader dr = cmd1.ExecuteReader();
                while (dr.Read())
                {
                    if (dr[0].ToString() == username)
                    {
                        check++;
                    }
                }
                connection.Close();

if (check == 0)
                {
                    MySqlConnection connection2 = new MySqlConnection(ConnectiongString);
                    connection2.Open();
                    try
                    {
                        MySqlCommand cmd2 = connection2.CreateCommand();
                        cmd2.CommandText = "INSERT INTO users (username,aiCount) VALUES (@username,@aiCount)";
                        cmd2.Parameters.AddWithValue("@username", username);
                        cmd2.Parameters.AddWithValue("@aiCount", aiCount);
                        cmd2.ExecuteNonQuery();
                        connection2.Close();


                    }
                    catch (Exception)
                    {
                        throw;
                    }

} else {

int aiCount_old = 0;

if (azCheck > 0)
                    {

//Here we are taking the existing values from this users row, 
//which we then add the new values from above and save.

                        MySqlConnection connection4 = new MySqlConnection(ConnectiongString);
                        connection4.Open();
                        try
                        {
                            MySqlCommand cmd2 = connection4.CreateCommand();
                            cmd2.CommandType = CommandType.Text;
                            cmd2.CommandText = "SELECT * from users WHERE username = @username";
                            cmd2.Parameters.AddWithValue("@username", username);
                            MySqlDataReader reader = cmd2.ExecuteReader();
                            while (reader.Read())
                            {
                                aiCount_old = Convert.ToInt32(reader["aiCount"].ToString());
}

                        }
                        catch (Exception)
                        {
                            throw;
                        }
                        connection4.Close();

                        aiCount += aiCount_old;

MySqlConnection connection5 = new MySqlConnection(ConnectiongString);

                        connection5.Open();
                        try
                        {
                            MySqlCommand cmd4 = connection5.CreateCommand();
                            cmd4.CommandType = CommandType.Text;
                            cmd4.CommandText = "UPDATE users SET aiCount = @aiCount WHERE LOWER(LTRIM(RTRIM(username))) = @username";
                            cmd4.Parameters.AddWithValue("@username", username.Trim().ToLower());
                            cmd4.Parameters.AddWithValue("@aiCount", aiCount.ToString());
                              cmd4.ExecuteNonQuery();
                            Console.WriteLine("User updated.");

                        }
                        catch (Exception ex)
                        {
                            throw;
                        }
                        connection5.Close();

Solution

  • You have several inefficiencies that I can spot right off the bat.

    You are constantly opening and closing your connection string. This is probably your biggest bottleneck. Open the connection once, then close it once when all your processing is done, and you'll probably see a massive increase in performance.

    You also use different connection objects when one will do, which will reduce your need to open and close connections.

    You also seem to have a misunderstanding of the use of "using" on connection objects. I see using (new MySqlConnection(ConnectiongString)), but that code is completely useless, as it doesn't do anything but initialize a connection object, which is immediately lost since it is not assigned to an object.

    Since you are processing everything sequentially, use connection as your connection object in every case, opening it only at the start of your processing, and closing it when processing is complete, then executing the Dispose method (the point of the using statement).

    This change alone might reduce processing time by an order of magnitude.

    NOTE: You will need a separate connection for your datareader if you need to do updates or another query while the datareader is open.