Search code examples
c#asp.net-corehttprequest.net-8.0

How does one access the Request Stream when using HttpClient when building a proxy?


I am building a simple proxy server in .NET 8.

The process is as follows:

  • A client POSTs a request (possibly a large one)
  • My proxy server gets that request and needs to process the incoming stream to actually change it (required for our application)
  • The proxy server then builds a new HttpRequestMessage with a StreamContent
  • The incoming request's stream is read in chunks of 64K (set in appsettings)
  • Each chunk is transformed with our specific process
  • The transformed chunks then need to be written to the StreamContent
  • Once the entire incoming request is processed, the HttpClient is posted with the transformed payload

The issue I have is that using StreamContent requires me to write ALL transformed data to it and I have to set the Position back to zero before I can POST it. This means (if I understand correctly) that the entire "new request" is in memory on my proxy server.

If I use the obsolete HttpWebRequest for my new request, I can get the RequestStream and process my incoming message in chunks which I write directly to the RequestStream. It seems like this is a much better approach as it should induce less memory pressure.

Am I missing something here?

Following is the code I have for using StreamContent:

/// <summary>
/// Uses the HttpRequestMessage / HttpResponseMessage to communicate with
/// the proxied API. This requires the entire Request stream to be assembled
/// before it can be sent to the API.
/// </summary>
/// <param name="clientHttpContext">The HttpContext for this request.</param>
/// <returns>An HttpResponseMessage which exposes its stream for processing.</returns>
private async Task<HttpResponseMessage> SendToProxiedAPIWithStreamContent(HttpContext clientHttpContext)
{
    byte[] incomingRequestBuffer = new byte[_settings.ChunkSize];
    HttpResponseMessage? response = null;

    //
    // Create a upstreamRequestStream to write chunks to for sending to the upstream server.
    //
    using (MemoryStream upstreamRequestStream = new MemoryStream(_settings.ChunkSize))
    {
        //
        // Create a StreamContent with the memory upstreamRequestStream as its internal implementation.
        //
        StreamContent upstreamContent = new StreamContent(upstreamRequestStream);

        //
        // Make a Request to send to the proxied API.
        //
        HttpRequestMessage upstreamRequest = new HttpRequestMessage();
        upstreamRequest.Method = new HttpMethod(clientHttpContext.Request.Method);
        upstreamRequest.RequestUri = new Uri($"{_settings.UpstreamUrl}/api/postdata");
        upstreamRequest.Content = upstreamContent;
        upstreamRequest.Content.Headers.ContentType = System.Net.Http.Headers.MediaTypeHeaderValue.Parse(clientHttpContext.Request.Headers.ContentType.First());

        //
        // Loop through the incoming upstreamRequestStream sending it to the Transform method
        // and then writing it to the upstream data stream.
        //
        int incomingBufferBytesRead = await clientHttpContext.Request.Body.ReadAsync(incomingRequestBuffer, 0, incomingRequestBuffer.Length);

        while (incomingBufferBytesRead > 0)
        {
            //
            // Process (transform) a single chunk prior to its going to the proxied API.
            //
            byte[] transformedBuffer = TransformTheIncomingBuffer(incomingRequestBuffer, incomingBufferBytesRead);

            //
            // Write the transformed data to the upstreamRequestStream that is wrapped in the StreamContent.
            //
            await upstreamRequestStream.WriteAsync(transformedBuffer, 0, transformedBuffer.Length);

            //
            // Clear my transformed buffer and get the next chunk from the input upstreamRequestStream.
            //
            Array.Clear(transformedBuffer);
            incomingBufferBytesRead = await clientHttpContext.Request.Body.ReadAsync(incomingRequestBuffer, 0, incomingRequestBuffer.Length);
        }

        //
        // Reset the upstreamRequestStream pointer on the outgoing upstreamRequestStream.
        // This is a problem - it means the entire object is in memory
        // so large objects will overwhelm this.
        // How can we feed chunks to the upstream request's StreamContent?
        //
        upstreamRequestStream.Position = 0;

        //
        // Send this request on to the httpClient that is bound to the proxied API.
        // But, by now we have read and transformed the entire incoming request clientResponseBody
        // which may be huge. How do we send this using chunks as we transform it?
        //
        response = await _httpClient.SendAsync(upstreamRequest);
    }

    return response;
}

And here is my code using the obsolete HttpWebRequest:

    /// <summary>
    /// Uses the obsolete WebRequest / WebResponse to communicate with
    /// the proxied API.  This allows us access to the upstream request stream.
    /// </summary>
    /// <param name="clientHttpContext">The HttpContext for this request.</param>
    /// <returns>An HttpWebResponse which exposes its stream for processing.</returns>
    private async Task<HttpWebResponse> SendToProxiedAPIWithWebRequest(HttpContext clientHttpContext)
    {
        byte[] incomingRequestBuffer = new byte[_settings.ChunkSize];
        HttpWebResponse? response = null;
        HttpWebRequest webRequest = (HttpWebRequest)WebRequest.CreateHttp($"{_settings.UpstreamUrl}/api/postdata");
        webRequest.Method = "POST";
        webRequest.ContentType = "application/json";

        using (var upstreamRequestStream = webRequest.GetRequestStream())
        {
            int incomingBufferBytesRead = await clientHttpContext.Request.Body.ReadAsync(incomingRequestBuffer, 0, incomingRequestBuffer.Length);
            long contentLength = 0;

            while (incomingBufferBytesRead > 0)
            {
                //
                // Process (transform) a single chunk prior to its going to the proxied API.
                //
                byte[] transformedBuffer = TransformTheIncomingBuffer(incomingRequestBuffer, incomingBufferBytesRead);
                contentLength += transformedBuffer.LongLength;
                //
                // Write the transformed data directly to the outgoing request upstreamRequestStream.
                // (Note: there is no way to do this using the HttpClient)
                //
                upstreamRequestStream.Write(transformedBuffer, 0, transformedBuffer.Length);
                //
                // Clear my transformed buffer and get the next chunk from the input stream.
                //
                Array.Clear(transformedBuffer);
                incomingBufferBytesRead = await clientHttpContext.Request.Body.ReadAsync(incomingRequestBuffer, 0, incomingRequestBuffer.Length);
            }

            webRequest.ContentLength = contentLength;
        }
        //
        // Send the request to the proxied API and get the httpResponseMessage.
        //
        response = (HttpWebResponse)await webRequest.GetResponseAsync();
        return response;
    }

Is there any way to use the HttpRequestMessage and still process my data in chunks?

I would like to use the newer method since from what I understand, it does a much better job of reusing connections and in a high volume proxy server this would definitely be an advantage.

Thanks in advance for any guidance.


Solution

  • You can't do this with the standard HttpContent classes, as they all expect the data to be ready upfront. What you need is a class than can pull the data from somewhere else.

    Here is one possible solution. It takes a Func which can be used to stream the data at the exact point that HttpClient demands it, and also optionally accepts a Func to supply a length if available.

    public class PullingStreamContent(Func<Stream, CancellationToken, Task> streamWriter, Func<long?>? getLength = null)
        : HttpContent
    {
        private readonly Func<Stream, CancellationToken, Task> _streamWriter = streamWriter;
        private readonly Func<long?>? _getLength = getLength;
        
        protected override Task SerializeToStreamAsync(Stream stream, TransportContext? context) =>
            _streamWriter(stream, default);
    
        protected override Task SerializeToStreamAsync(Stream stream, TransportContext? context, CancellationToken cancellationToken) =>
            _streamWriter(stream, cancellationToken);
    
        protected override bool TryComputeLength(out long length)
        {
            var l = _getLength?.Invoke();
            length = l.GetValueOrDefault();
            return l.HasValue;
        }
    }
    

    The length lambda is optional, and if you don't provide it then you won't get a Content-Length, and instead the client will do Chunked Transfer.

    Then you can pass in a lambda that pulls the data from one side and sends direct to the other.

    private async Task<HttpResponseMessage> SendToProxiedAPIWithStreamContent(HttpContext clientHttpContext)
    {
        using var upstreamContent = new PullingStreamContent(async (outputStream, ct) =>
        {
            var incomingRequestBuffer = new byte[_settings.ChunkSize];
            var body = clientHttpContext.Request.Body;
    
            int incomingBufferBytesRead;
            while ((incomingBufferBytesRead = await body.ReadAsync(incomingRequestBuffer, ct)) > 0)
            {
                // perhaps a reusable transform buffer as well??
                var transformedBuffer = TransformTheIncomingBuffer(incomingRequestBuffer.AsMemory(incomingBufferBytesRead));
                await outputStream.WriteAsync(transformedBuffer, ct);
            }
        });
        upstreamContent.Headers.ContentType = MediaTypeHeaderValue.Parse(clientHttpContext.Request.Headers.ContentType.First());
    
        using var upstreamRequest = new HttpRequestMessage(HttpMethod.Parse(clientHttpContext.Request.Method), $"{_settings.UpstreamUrl}/api/postdata");
        upstreamRequest.Content = upstreamContent;
        var response = await _httpClient.SendAsync(upstreamRequest);
        return response;
    }
    

    Other points to note

    • Handing around an HttpResponseMessage is a bit of a code smell. This function should really deal with and dispose it immediately.
    • I've used AsMemory to pass around segments of arrays.
    • A standard read loop has only one place that does the read, of the form:
      while ((bytesRead = DoRead()) > 0) {
      
      This removes the need to repeat the read statement.
    • HttpMethod.Parse returns singletons, rather than doing new HttpMethod on every run.