Search code examples
c#memory-leakssqlconnectionmultitaskingef-core-2.2

Where do I have memory leaks and how to fix it? Why memory consumption increases?


I am struggling a few days already with a problem of growing memory consumption by console application in .Net Core 2.2, and just now I ran out of ideas what else I could improve.

Im my application I have a method that triggers StartUpdatingAsync method:

public MenuViewModel()
        {
            if (File.Exists(_logFile))
                File.Delete(_logFile);

            try
        {
            StartUpdatingAsync("basic").GetAwaiter().GetResult();
        }
        catch (ArgumentException aex)
        {
            Console.WriteLine($"Caught ArgumentException: {aex.Message}");
        }

            Console.ReadKey();
        }

StartUpdatingAsync creates 'repo' and instance is getting from DB a list of objects to be updated (around 200k):

private async Task StartUpdatingAsync(string dataType)
        {
            _repo = new DataRepository();
            List<SomeModel> some_list = new List<SomeModel>();
            some_list = _repo.GetAllToBeUpdated();

            await IterateStepsAsync(some_list, _step, dataType);
        }

And now, within IterateStepsAsync we are getting updates, parsing them with existing data and updating DB. Inside of each while I was creating new instances of all new classes and lists, to be sure that old ones are releasing memory, but it didnt help. Also I was GC.Collect() at the end of the method, what also is not helping. Please note, that method below triggers lots of parralel Tasks, but they supposed to be disposed within it, am I right?:

private async Task IterateStepsAsync(List<SomeModel> some_list, int step, string dataType)
        {
            List<Area> areas = _repo.GetAreas();
            int counter = 0;

            while (counter < some_list.Count)
            {
                _repo = new DataRepository();
                _updates = new HttpUpdates();
                List<Task> tasks = new List<Task>();
                List<VesselModel> vessels = new List<VesselModel>();
                SemaphoreSlim throttler = new SemaphoreSlim(_degreeOfParallelism);

                for (int i = counter; i < step; i++)
                {
                    int iteration = i;
                    bool skip = false;

                    if (dataType == "basic" && (some_list[iteration].Mmsi == 0 || !some_list[iteration].Speed.HasValue)) //if could not be parsed with "full"
                        skip = true;

                    tasks.Add(Task.Run(async () =>
                    {
                        string updated= "";
                        await throttler.WaitAsync();
                        try
                        {
                            if (!skip)
                            {
                                Model model= await _updates.ScrapeSingleModelAsync(some_list[iteration].Mmsi);
                                while (Updating)
                                {
                                    await Task.Delay(1000);
                                }
                                if (model != null)
                                {
                                    lock (((ICollection)vessels).SyncRoot)
                                    {
                                        vessels.Add(model);

                                        scraped = BuildData(model);
                                    }
                                }
                            }
                            else
                            {
                                //do nothing
                            }
                        }
                        catch (Exception ex)
                        {
                            Log("Scrape error: " + ex.Message);
                        }
                        finally
                        {
                            while (Updating)
                            {
                                await Task.Delay(1000);
                            }
                            Console.WriteLine("Updates for " + counter++ + " of " + some_list.Count + scraped);

                            throttler.Release();
                        }

                    }));
                }

                try
                {
                    await Task.WhenAll(tasks);
                }
                catch (Exception ex)
                {
                    Log("Critical error: " + ex.Message);
                }
                finally
                {
                    _repo.UpdateModels(vessels, dataType, counter, some_list.Count, _step);

                    step = step + _step;

                    GC.Collect();
                }
            }
        }

Inside of the method above, we are calling _repo.UpdateModels, where is updated DB. I tryed two approaches, with using EC Core and SqlConnection. Both with similar results. Below you can find both of them.

EF Core

internal List<VesselModel> UpdateModels(List<Model> vessels, string dataType, int counter, int total, int _step)
        {

            for (int i = 0; i < vessels.Count; i++)
            {
                Console.WriteLine("Parsing " + i + " of " + vessels.Count);

                Model existing = _context.Vessels.Where(v => v.id == vessels[i].Id).FirstOrDefault();
                if (vessels[i].LatestActivity.HasValue)
                {
                    existing.LatestActivity = vessels[i].LatestActivity;
                }
                //and similar parsing several times, as above
            }

            Console.WriteLine("Saving ...");
            _context.SaveChanges();
            return new List<Model>(_step);
        }

SqlConnection

internal List<VesselModel> UpdateModels(List<Model> vessels, string dataType, int counter, int total, int _step)
        {
            if (vessels.Count > 0)
            {
                using (SqlConnection connection = GetConnection(_connectionString))
                using (SqlCommand command = connection.CreateCommand())
                {
                    connection.Open();
                    StringBuilder querySb = new StringBuilder();

                    for (int i = 0; i < vessels.Count; i++)
                    {
                        Console.WriteLine("Updating " + i + " of " + vessels.Count);
                        //PARSE

                        VesselAisUpdateModel existing = new VesselAisUpdateModel();

                        if (vessels[i].Id > 0)
                        {
                            //find existing
                        }

                        if (existing != null)
                        {
                            //update for basic data
                            querySb.Append("UPDATE dbo." + _vesselsTableName + " SET Id = '" + vessels[i].Id+ "'");
                            if (existing.Mmsi == 0)
                            {
                                if (vessels[i].MMSI.HasValue)
                                {
                                    querySb.Append(" , MMSI = '" + vessels[i].MMSI + "'");
                                }
                            }
                            //and similar parsing several times, as above

                            querySb.Append(" WHERE Id= " + existing.Id+ "; ");

                            querySb.AppendLine();
                        }
                    }

                    try
                    {
                        Console.WriteLine("Sending SQL query to " + counter);
                        command.CommandTimeout = 3000;
                        command.CommandType = CommandType.Text;
                        command.CommandText = querySb.ToString();
                        command.ExecuteNonQuery();
                    }
                    catch (Exception ex)
                    {
                        Console.WriteLine(ex.Message);
                    }
                    finally
                    {
                        connection.Close();
                    }
                }
            }
            return new List<Model>(_step);
        }

Main problem is, that after tenths/hundreds of thousands of updated models my console application memory consumption increases continuously. And I have no idea why.

SOLUTION my problem was inside of ScrapeSingleModelAsync method, where I was using incorrectly HtmlAgilityPack, what I could debug thanks to cassandrad.


Solution

  • Your code is messy, with huge amount of different objects with unknown lifetime. It's hardly possible to figure out the problem just looking at it.

    Consider using profiling tools, for example Visual Studio's Diagnostic Tools, they will help you to find what objects are living too long in the heap. Here is overview of its functions related to memory profiling. Highly recomended to be read.

    In short, you need to take two snapshots and look at what objects are taking the most memory. Let's look at simple example.

    int[] first = new int[10000];
    Console.WriteLine(first.Length);
    int[] secod = new int[9999];
    Console.WriteLine(secod.Length);
    Console.ReadKey();
    

    Take the first snapshot when your function works at least once. In my case, I took snapshot when the first huge space has been alocated. enter image description here

    After that, let your app be working some time so the difference in memory usage become noticeable, take the second memory snapshot.

    enter image description here You'll notice that another snapshot is added with info about how much is the difference. To get more specific info, click on one or another blue label of the latest snapshot to open snapshots comparison.

    enter image description here

    Following my example, we can see that there is change in count of int arrays. By default int[] wasn't visible in the table, so I had to uncheck Just My Code in filtration options.
    So, this is what needs to be done. After you figure out what objects increase in count or size over time, you can locate where these objects are create and optimize this operation.