I have an Azure App Service where I need to activate TLS mutual authentication, and I ran into a completely unexpected issue. We need this service in order to upload images from IoT devices; the images are relatively small (<300 KB), and they are uploaded via multipart/form-data
HTTP POST requests to this endpoint.
The problem: with client-side authentication enabled we can only upload files smaller than 100 KB (I don't know the exact limit; I know 100,000 bytes works, and 150,000 bytes doesn't work). Anything larger than that, and we receive 403 Forbidden from the load balancer (the request never reaches our code). If we disable client-side authentication everything works as expected (the request reaches our code, which logs the request and then obviously fails, since the X-ARR-ClientCert
header is missing – but at least the request goes through to our application).
I was unable to find any resources regarding this topic, Microsoft doesn't seem to document any size limitation when using client-side authentication, and we never intended to limit the file sizes. The thing that bothers me most is that the limitation seems to appear only when using client-side authentication, which makes no sense to me from a security perspective (if anything, the rules should be more relaxed when using client-side authentication).
Did anyone else encounter this? Any pointers would help, I'm totally stumped as to why it behaves like this, how I should investigate it further, or how I could go about addressing the issue.
LE: here's how it behaves when I try to upload a small file (100,000 bytes):
$ curl --cert my.crt --key my.key https://my-site.azurewebsites.net/Upload/uploadImage -F [email protected] --cookie-jar sys-cookies.jar --cookie sys-cookies.jar --tlsv1.2 -v
* Trying x.x.x.x:443...
* Connected to my-site.azurewebsites.net (x.x.x.x) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
* CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=US; ST=WA; L=Redmond; O=Microsoft Corporation; CN=*.azurewebsites.net
* start date: Mar 14 18:39:55 2022 GMT
* expire date: Mar 9 18:39:55 2023 GMT
* subjectAltName: host "my-site.azurewebsites.net" matched cert's "*.azurewebsites.net"
* issuer: C=US; O=Microsoft Corporation; CN=Microsoft Azure TLS Issuing CA 01
* SSL certificate verify ok.
> POST /Upload/uploadImage
> Host: my-site.azurewebsites.net
> User-Agent: curl/7.74.0
> Accept: */*
> Cookie: ARRAffinitySameSite=b[...]7; ARRAffinity=b[...]7
> Content-Length: 100193
> Content-Type: multipart/form-data; boundary=------------------------e6811f73870ec90c
>
* We are completely uploaded and fine
* TLSv1.2 (IN), TLS handshake, Hello request (0):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Request CERT (13):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS handshake, CERT verify (15):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
< HTTP/1.1 500 Internal Server Error
< Content-Type: application/json; charset=utf-8
< Date: Thu, 09 Jun 2022 06:46:01 GMT
< Server: Microsoft-IIS/10.0
< Access-Control-Allow-Origin: *
* Replaced cookie ARRAffinity="b[...]7" for domain my-site.azurewebsites.net, path /, expire 0
< Set-Cookie: ARRAffinity=b[...]7;Path=/;HttpOnly;Secure;Domain=my-site.azurewebsites.net
* Replaced cookie ARRAffinitySameSite="b[...]7" for domain my-site.azurewebsites.net, path /, expire 0
< Set-Cookie: ARRAffinitySameSite=b[...]7;Path=/;HttpOnly;SameSite=None;Secure;Domain=my-site.azurewebsites.net
< Transfer-Encoding: chunked
< X-Powered-By: ASP.NET
<
* Connection #0 to host my-site.azurewebsites.net left intact
{"error":"Error on uploading image!"}
The error is issued by our code, because I simply truncated a JPEG file to 100,000 bytes, so it's obviously not a valid image anymore.
For comparison, here's what happens with a large file (150,000 bytes):
$ curl --cert my.crt --key my.key https://my-site.azurewebsites.net/Upload/uploadImage -F [email protected] --cookie-jar sys-cookies.jar --cookie sys-cookies.jar --tlsv1.2 -v
* Trying x.x.x.x:443...
* Connected to my-site.azurewebsites.net (x.x.x.x) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
* CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=US; ST=WA; L=Redmond; O=Microsoft Corporation; CN=*.azurewebsites.net
* start date: Mar 14 18:39:55 2022 GMT
* expire date: Mar 9 18:39:55 2023 GMT
* subjectAltName: host "my-site.azurewebsites.net" matched cert's "*.azurewebsites.net"
* issuer: C=US; O=Microsoft Corporation; CN=Microsoft Azure TLS Issuing CA 01
* SSL certificate verify ok.
> POST /Upload/uploadImage HTTP/1.1
> Host: my-site.azurewebsites.net
> User-Agent: curl/7.74.0
> Accept: */*
> Cookie: ARRAffinitySameSite=b[...]7; ARRAffinity=b[...]7
> Content-Length: 150193
> Content-Type: multipart/form-data; boundary=------------------------8f78ee43724d4b8d
>
* TLSv1.2 (IN), TLS handshake, Hello request (0):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* Mark bundle as not supporting multiuse
< HTTP/1.1 403 Forbidden
< Content-Length: 0
< Connection: close
< Date: Thu, 09 Jun 2022 06:45:39 GMT
<
* we are done reading and this is set to close, stop send
* Closing connection 0
Notice how the load balancer actively terminates the request prematurely – the exchange is much shorter for the longer file, and cURL never finishes uploading the file; it never even reaches the 100,000 byte mark!
LE 2022-09: It turns out this is caused by a mismatch between the behavior of a legacy IIS engine and curl
(not openssl
). If you ever encounter this issue, just call curl with -H Expect: 100-continue
in order to force it to send that header. You can safely ignore everything below this paragraph; I only left it here for historical purposes.
It turns out there's currently a glitch in Azure App Services which only manifests itself in these particular circumstances:
I have performed the following tests:
Platform | Distro | OpenSSL version | Result |
---|---|---|---|
PC | Windows/Postman GUI | (unknown) | OK |
Azure VM | Ubuntu | 1.1.1f (default) | OK |
PC (Windows WSL) | Ubuntu | 1.1.1f (default) | OK |
PC (Windows WSL) | Ubuntu | 1.1.1n (built from sources) | OK |
Raspberry Pi | Raspbian | 1.1.1n (default) | Fail |
Raspberry Pi | Raspbian | 1.1.1f (built from sources) | Fail |
Raspberry Pi | Raspbian | 1.1.1k (node) | Fail |
Raspberry Pi | Ubuntu | 3.0.2 | Fail |
Docker image on x86 | Alpine | 1.1.1o | Fail |
Docker image on x86 | Ubuntu | 3.0.2 | Fail |
Docker image on x86 | Gentoo | 1.1.1o | Fail |
For a limited time you can check your own ARM device using any flavor of Linux like so:
bash <(curl -s https://f002.backblazeb2.com/file/bogdan-stancescu/azure/test-appsp-mutual-tls.sh)
or, if you want to be more careful (as you should be), download the script at the URL, inspect it, and then execute it,
or you can test any of the containerized alternatives above with
DISTRO=alpine # or gentoo or ubuntu
docker pull docker.io/gutza/$DISTRO-appsp-mutual-tls:latest
docker run -it docker.io/gutza/$DISTRO-appsp-mutual-tls:latest
I submitted a bug report to Microsoft last night after a bit of struggle; I'll update this answer as things progress.
LE: I also submitted a bug to openssl, since that must be the root cause of the issue (Azure is probably too fussy about the TLS exchange, but openssl shouldn't behave differently on different platforms).
LE2: I published my experimental workbench, in case anyone finds it useful.
LE3: It turns out the bug is also replicable on x86 for particular Linux distros, after all!
LE4: After an initial assessment, the Microsoft support guy told me to use a five-year-old blog post describing an IIS issue from circa 2007. Or disable mutual TLS altogether. I laughed.
LE5: Here's Microsoft's final position on the issue:
Based on our call, I understood the issue to be your App Service dropping large requests (> 100,000 bytes) when the client certificate feature is enabled and the behavior was different on specific platforms. The app service side of the issue is my primary concern or at least that is where my troubleshooting starts and it is also in line with the main issue you posted on stackoverflow. Hence, I provided a doc that explains why large files can fail when you enable client certificates. The age of the documentation does not matter, we discuss this internally and that is the public-facing document at this time.
Most of the other tools you mentioned are 3rd party or out of scope and there is only a limit to how far we can test.
As for the bug itself, a workaround has been provided but there is no ETA at the moment.
We apologize for the inconvenience this has caused you.
I consider this closed. We're basically supposed to work around it; that's it.