Search code examples
c#.netoptimizationtask-parallel-libraryparallel.foreach

Optimising the process of a Huge List<T> in C#


I'm working on a scheduling algorithm that generates/assigns time-slots to a List of Recipients based on the following restrictions:

  • Max Recipients per Minute
  • Max Recipients per Hour

Suppose that the delivery Start Time is 2018-10-17 9:00 AM and we have 19 recipients with Max of 5 per min and and 10 per hour, so the output should be:

  1. 5 Recipients will be scheduled on 2018-10-17 9:00 AM
  2. 5 Recipients will be scheduled on 2018-10-17 9:01 AM
  3. 5 Recipients will be scheduled on 2018-10-17 10:00 AM
  4. 4 Recipients will be scheduled on 2018-10-17 10:01 AM

The algorithm is very accurate, but the way it works is as following:

  • First it generates a list of time-slots or time-windows that are accurately fits the no. of recipients based on the restrictions i mentioned before.
  • then, I'm moving whatever available in the List of Time-Slots for each set/group or recipients.
  • in the list of Time-Slots I added a counter that increments for every recipient added to it, so in this way I can track the no. of each recipients added to each time-slot to respect the Max per Min./Hr restrictions.

The previous process it simplified in this code snippet - I'm using While Loop to iterate, in my case when having 500K recipients this is taking 28 minutes to get it done! I tried to use Parallel.ForEach but I couldn't figure out how to implement it in this case.

DateTime DeliveryStart = DateTime.Now;
//This list has DateTime: Time-windows  values starting from DeliveryStart to the Max value of the time needed to schedule the Recipients
var listOfTimeSlots = new List<Tuple<DateTime, bool, int>>();
//List of Recipients with Two types of data: DateTime to tell when its scheduled and int value refers to the Recipient's ID
var ListOfRecipients = new List<Tuple<DateTime, int>>();
List<Tuple<int, DateTime>> RecipientsWithTimeSlots= new List<Tuple<int, DateTime>>();
int noOfRecipients = ListOfRecipients.Count;

int Prevhour = 0, _AddedPerHour = 0, Prevday = 0;
// Scheduling restrictions 
int _MaxPerHour = 5400, _MaxPerMinute = 90;
int i = 0;
int indexStart = 0;

// ...
//     ...
//           Code to fill listOfTimeSlots ListOfRecipients with Data

while (noOfRecipients > 0)
{
    var TimeStamp = listOfTimeSlots[i];

    int hour = TimeStamp.Item1.Hour;
    int day = TimeStamp.Item1.Day;

    if (Prevhour == 0)
    {
        Prevhour = hour;
        Prevday = day;
    }
    if (Prevhour != hour)
    {
        Prevhour = hour;
        _AddedPerHour = 0;
    }

    if (_AddedPerHour >= _MaxPerHour)
    {
        var tmpItem = listOfTimeSlots.Where(l => l.Item1.Hour == hour && l.Item1.Day == day).LastOrDefault();
        int indexOfNextItem = listOfTimeSlots.LastIndexOf(tmpItem) + 1;
        i = indexOfNextItem;
        _AddedPerHour = 0;
        continue;
    }
    else
    {
        int endIndex;


        endIndex = _MaxPerMinute > noOfRecipients ? noOfRecipients : _MaxPerMinute;

        if (endIndex > Math.Abs(_AddedPerHour - _MaxPerHour))
            endIndex = Math.Abs(_AddedPerHour - _MaxPerHour);

        var RecipientsToIteratePerMinute = ListOfRecipients.GetRange(indexStart, endIndex);

        foreach (var item in RecipientsToIteratePerMinute)
        {
            RecipientsWithTimeSlots.Add(new Tuple<int, DateTime>(item.Item2, TimeStamp.Item1));
            listOfTimeSlots[i] = new Tuple<DateTime, bool, int>(TimeStamp.Item1, true, listOfTimeSlots[i].Item3 + 1);
            _AddedPerHour++;
        }

        indexStart += endIndex;
        noOfRecipients -= endIndex;
        i++;

    }
}

I simplified the code in here, for not making it so complex to understand, all i want it to speed-up the while loop or replacing it with a Parallel.ForEach.

THE WHILE LOOP IS NEVER SIMPLIFIED, THIS IS HOW IT EXACTLY WORKS \

Any help or suggestion is appreciated.


Solution

  • Here is a different approach. It creates the groups of ids first, then assigns them the date based on the requirements.

    First, a class to represent the groups (avoid them tuples):

    public class RecipientGroup
    {       
        public RecipientGroup(DateTime scheduledDateTime, IEnumerable<int> recipients)
        {
            ScheduledDateTime= scheduledDateTime;
            Recipients = recipients;
        }
    
        public DateTime ScheduledDateTime { get; private set; }
        public IEnumerable<int> Recipients { get; private set; }
    
        public override string ToString()
        {
            return string.Format($"Date: {ScheduledDateTime.ToShortDateString()} {ScheduledDateTime.ToLongTimeString()}, count: {Recipients.Count()}");
        }
    }
    

    Then a class to iterate through the groups. You will see why this is needed later:

    public class GroupIterator
    {        
        public GroupIterator(DateTime scheduledDateTime)
        {
            ScheduledDateTime = scheduledDateTime;
        }
    
        public DateTime ScheduledDateTime { get; set; }
        public int Count { get; set; }
    }
    

    Now, the code:

    DateTime DeliveryStart = new DateTime(2018, 10, 17);
            
    //List of Recipients (fake populate function)
    IEnumerable<int> allRecipients = PopulateRecipients();            
    
    // Scheduling restrictions 
    int maxPerMinute = 90;
    int maxPerHour = 270;
    
    //Creates groups broken down by the max per minute.  
    var groupsPerMinute = allRecipients
            .Select((s, i) => new { Value = s, Index = i })
            .GroupBy(x => x.Index / maxPerMinute)
            .Select(group => group.Select(x => x.Value).ToArray());
    
    //This will be the resulting groups
    var deliveryDateGroups = new List<RecipientGroup>();
    
    //Perform an aggregate run on the groups using the iterator
    groupsPerMinute.Aggregate(new GroupIterator(DeliveryStart), (iterator, ids) => 
    {
        var nextBreak = iterator.Count + ids.Count();
        if (nextBreak >= maxPerHour)
        {
            //Will go over limit, split
            var difference = nextBreak-maxPerHour;
            var groupSize = ids.Count() - difference;
            //This group completes the batch
            var group = new RecipientGroup(iterator.ScheduledDateTime, ids.Take(groupSize));
            deliveryDateGroups.Add(group);
            var newDate = iterator.ScheduledDateTime.AddHours(1).AddMinutes(-iterator.ScheduledDateTime.Minute);
            //Add new group with remaining recipients.
            var stragglers = new RecipientGroup(newDate, ids.Skip(groupSize));
            deliveryDateGroups.Add(stragglers);
            return new GroupIterator(newDate, difference);
        }                    
        else
        {
            var group = new RecipientGroup(iterator.ScheduledDateTime, ids);
            deliveryDateGroups.Add(group);
            iterator.ScheduledDateTime = iterator.ScheduledDateTime.AddMinutes(1);
            iterator.Count += ids.Count();
            return iterator;
        }                      
    });
    
    //Output minute group count
    Console.WriteLine($"Group count: {deliveryDateGroups.Count}");
    
    //Groups by hour
    var byHour = deliveryDateGroups.GroupBy(g => new DateTime(g.ScheduledDateTime.Year, g.ScheduledDateTime.Month, g.ScheduledDateTime.Day, g.ScheduledDateTime.Hour, 0, 0));
    
    Console.WriteLine($"Hour Group count: {byHour.Count()}");
    foreach (var group in byHour)
    {
         Console.WriteLine($"Date: {group.Key.ToShortDateString()} {group.Key.ToShortTimeString()}; Count: {group.Count()}; Recipients: {group.Sum(g => g.Recipients.Count())}");
    }
    

    Output:

    Group count: 5556

    Hour Group count: 1852

    Date: 10/17/2018 12:00 AM; Count: 3; Recipients: 270

    Date: 10/17/2018 1:00 AM; Count: 3; Recipients: 270

    Date: 10/17/2018 2:00 AM; Count: 3; Recipients: 270

    Date: 10/17/2018 3:00 AM; Count: 3; Recipients: 270

    Date: 10/17/2018 4:00 AM; Count: 3; Recipients: 270

    Date: 10/17/2018 5:00 AM; Count: 3; Recipients: 270

    ... and so on for all 1852 groups.

    This takes about 3 seconds to complete.

    I am sure there are edge cases. I wrote this in a hurry so just think about those.