We've previously been running a single Nginx reverse proxy between the internet and our microservices with the config:
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
Which had the requests piped through with headers like:
User -> ALB [nginx] -> App Servers
IP: 1.2.3.4 IP: 172.31.1.1 IP: n/a
Forwarded-For: 1.2.3.4 Real-IP: 1.2.3.4
Forwarded-For: 1.2.3.4, 172.31.1.1
But now that we need to scale out the ALBs behind an Elastic LB we're finding the extra layer of proxy problematic, eg:
User -> ELB -> ALB [nginx] -> App Servers
IP: 1.2.3.4 IP: 172.31.1.2 IP: 172.31.1.1 IP: n/a
Forwarded-For: 1.2.3.4 Forwarded-For: 1.2.3.4, 172.31.1.2 Real-IP: 172.31.1.2
Forwarded-For: 1.2.3.4, 172.31.1.2
But as you can see this is currently just setting X-Real-IP:
to the ELB's IP.
We need to be able to strip off the trusted proxies and send the proper User IP in the X-Real-IP
header, as well as log the User IP rather than the ELB IP.
The GeoIP module has the geoip_proxy
directives that define trusted proxies to ignore when determining the "true" IP, and I have to wonder if there's something similar in nginx or some other way to accomplish this?
TIA
Well the short answer is that there's not a simple config directive for this, and there's not a 100% bulletproof way to only trust certain proxies.
Let’s construct an example. My IP is 1.2.3.4
, but I'm malicious and I want to pretend that I’m 5.6.7.8
. I inject my own X-Forwarded-For: header via browser extension and the nginx box gets:
X-Forwarded-For: 5.6.7.8, 1.2.3.4, 172.31.1.1, 172.31.1.2
map $proxy_add_x_forwarded_for $x_real_ip {
"~^([^,]+).*" $1;
default $remote_addr;
}
All this does is peel off the first IP address for the X-Forwarded-For:
header, which is all well and good if you don’t mind users spoofing other IP addresses via header injection. With the above example this method will return the spoofed 5.6.7.8
.
Ideally we only want to strip off the two trusted proxies and use the first untrsuted IP. For this we’re going to have to get a little creative with the regular expressions:
map $proxy_add_x_forwarded_for $x_real_ip {
"~(?:^|, )([^,]+), (?:10\.|172\.(?:1[6-9]|2[0-9]|3[01])\.).*" $1;
default $remote_addr;
}
About half of that regex is just dealing with the fact that the 172.16.0.0/12 network doesn't split cleanly on an octet boundary, but it does the trick. For the above example it correctly returns 1.2.3.4
.
However, if an outside attacker somehow knows that this kludgey config is in use and also knows what the trusted networks are they could set the following header to get around it:
X-Forwarded-For: 5.6.7.8, 172.16.0.1
Which ends up at the proxy as:
X-Forwarded-For: 5.6.7.8, 172.16.0.1, 1.2.3.4, 172.31.1.1, 172.31.1.2
Since that regex is essentially reading left-to-right and returning the IP to the left of the first trusted IP, in this specific case it will return the spoofed 5.6.7.8
IP. However, this is quite a corner case and is acceptable for my particular use, YMMV.
Caveat: You may get an error saying “you should increase map_hash_bucket_size” which means that that you need to increase that value to accommodate that bigass regex. However, the docs on that are a bit fiddly and talk about “alignment” being important, so if you’ve not otherwise set that value somewhere else I would suggest doubling the value referenced in the message. In my case I doubled it from 64 to 128.
IMHO this requires actually parsing the header and applying real logic, so it would either need to be patched into nginx or written into a module. Essentially porting the same logic that the GeoIP module uses for geoip_proxy
and geoip_proxy_recursive
.
Alternatively, you could make your application proxy-aware and implement the logic there. If you know how to properly wrangle IPs and subnets it should be a cinch. Unfortunately I don't have that option available to me in this case.
Thanks: If it weren't for membear on IRC reminding me that regex capture groups are valid inside map{}
blocks I'd probably still be spinning my wheels.
Shameless Plug: I originally wrote a blog post on this before I remembered that I asked here. It's mostly the same, but also has a more detailed breakdown of that big regex.