Search code examples
c#asp.netasync-awaitparallel-processinghttprequest

How to make multiple API calls faster?


I am requesting the data from some kind of Products API, but the thing is that I am getting it 20 by 20. So the endpoint looks like this:

https://www.someDummyAPI.com/Api/Products?offset=0&count=20

Note: I can't change the count, it will always be 20.

I.e. The data from this endpoint will contain 20 record, from 0 to 20 and after that I have to increase offset by 20 to get next 20 record and so on (totally it's about 1500 record so I have to make approximately 700 request ).

After getting all the data I am inserting it into the SQL database using stored procedure (this is different process).

So my question is, how can I speed up the fetching process, I thought about running tasks in parallel but I need to get results from the response.

For now this process looks like this :

    protected async void FSL_Sync_btn_Click(object sender, EventArgs e)
    {
        int offset = 0;
        int total= 0;
        bool isFirst = true;
        DataTable resTbl = CreateDt();
        while (offset < total || offset == 0)
        {
            try
            {
                var data = await GetFSLData(offset.ToString(),"Products");

                JObject Jresult = JObject.Parse(data);

                if (isFirst)
                {
                    Int32.TryParse(Jresult.SelectToken("total").ToString(),out total);
                    isFirst = false;
                }
                // Function to chain up data in DataTable
                resTbl = WriteInDataTable(resTbl, Jresult);

                offset += 20;
            }
            catch(Exception ex)
            {
                var msg = ex.Message;
            }
        }
    }

So the process flow I am taking is:

  1. Get data from API (let's say first 20 record).
  2. Add it two existing DataTable using WriteInDataTable function.
  3. Insert data into SQL Database from this resTbl Datatable (completely different process, not shown in this screenshot).

I haven't used parallel tasks yet (don't even know if it's a correct solution for it), so would appreciate any help.


Solution

  • If you have upgraded to the .NET 6 platform, you could consider using the Parallel.ForEachAsync method to parallelize the GetFSLData invocations. This method requires an IEnumerable<T> sequence as source. You can create this sequence using LINQ (the Enumerable.Range method). To avoid any problems associated with the thread-safety of the DataTable class, you can store the JObject results in an intermediate ConcurrentQueue<JObject> collection, and defer the creation of the DataTable until all the data have been fetched and are locally available. You may need to also store the offset associated with each JObject, so that the results can be inserted in their original order. Putting everything together:

    protected async void FSL_Sync_btn_Click(object sender, EventArgs e)
    {
        int total = Int32.MaxValue;
        IEnumerable<int> offsets = Enumerable
            .Range(0, Int32.MaxValue)
            .Select(n => checked(n * 20))
            .TakeWhile(offset => offset < Volatile.Read(ref total));
    
        var options = new ParallelOptions() { MaxDegreeOfParallelism = 10 };
        var results = new ConcurrentQueue<(int Offset, JObject JResult)>();
        await Parallel.ForEachAsync(offsets, options, async (offset, ct) =>
        {
            string data = await GetFSLData(offset.ToString(), "Products");
            JObject Jresult = JObject.Parse(data);
            if (offset == 0)
            {
                Volatile.Write(ref total,
                    Int32.Parse(Jresult.SelectToken("total").ToString()));
            }
            results.Enqueue((offset, Jresult));
        });
    
        DataTable resTbl = CreateDt();
        foreach (var (offset, Jresult) in results.OrderBy(e => e.Offset))
        {
            resTbl = WriteInDataTable(resTbl, Jresult);
        }
    }
    

    The Volatile.Read/Volatile.Write are required because the total variable might be accessed by multiple threads in parallel.

    In order to get optimal performance, you may need to adjust the MaxDegreeOfParallelism configuration, according to the capabilities of the remote server and your internet connection.

    Note: This solution is not efficient memory-wise, because it requires that all data are stored in memory in two different formats at the same time.