I've googled the crap out of this problem and came up with nothing, so hopefully you guys can help.
Goal
Configure a reverse proxy (using Apache's mod_proxy) to enable access to an internal PHP application over the internet using mod_proxy_html to rewrite URLs in my application.
Problem Description
Consider landing.php with only this code:
<a href="redirect.php">redirect.php</a>
and redirect.php with only:
<?php
header("Location:http://internal.example.com/landing.php");
?>
with this snippet from my httpd.conf
UseCanonicalName On
LoadFile /usr/lib64/libxml2.so
LoadModule proxy_html_module modules/mod_proxy_html.so
LoadModule xml2enc_module modules/mod_xml2enc.so
<VirtualHost *:80>
ServerName example.com
<Proxy *>
Order deny,allow
Allow from all
AllowOverride None
</Proxy>
ProxyPass / http://internal.example.com/
ProxyPassReverse / http://internal.example.com/
ProxyHTMLLinks a href #Commenting out this line fixes the problem, but I need it for rewriting
ProxyHTMLEnable On
RequestHeader unset Accept-Encoding
</VirtualHost>
When I go to http://example.com/landing.php and click "redirect.php" it should take me back to the landing.php page. Instead, I get a "This connection was reset" in Firefox, or "No data received" in Chrome. (FYI, going to http://internal.example.com/redirect.php redirects correctly.)
The Question:
Why the redirect would fail going through the reverse proxy, and how can I fix this?
Hints
I've discovered a few things that could be helpful...
I know that if I comment out "ProxyHTMLLinks a href", this will work correctly. But obviously, this is the rewrite functionality I need.
I can also change the redirect.php page to the following, this works correctly:
<?php
header("Location:http://internal.example.com/landing.php");
?>
random text
I guess this text somehow does something to the page or HTTP headers that make mod_proxy_html (or more specifically ProxyHTMLLinks) operate differently than without it.
I can also change the redirect.php page to the following and have it work:
<?php
header("Location:http://internal.example.com/landing.php");
header("Content-Type:");
?>
This works because ProxyHTMLLinks, by default, is only applied to Content-Type text/html files. However, I don't want to have to hack all calls to header("Location:...") to make this work. I don't mind changing all the calls to header("Location:..."), assuming what I'm changing is correcting a problem, not creating a hack.
Lastly, I've done some packet sniffing on the reverse proxy server and discovered that the header("Location:...") sends a HTTP/1.1 302 Not Found to the reverse proxy server, but it doesn't pass this through to the browser requesting redirect.php. When I try one of the "solutions" above, the 302 is then passed from the reverse proxy server to the computer requesting redirect.php. My understanding is that the Location header should go to the browser, and then the browser should request the new location passed back. So it is failing because the 302 doesn't make it to the browser...
FYI, I've tried looking at the error logs to see if mod_proxy_html is failing somewhere, but I don't see anything, though I'm open to specific suggestions with regards to logging, since I'm not 100% sure if I'm setting the logging up correctly.
Sorry this is so long, just trying to be as specific as possible. Thanks in advance!
I figured out the problem. I needed to explicitly pass the charset in the header Content-Type for this to work. This was accomplished by adding:
AddDefaultCharset utf-8
to my Apache config file. This globally fixed all calls to header("Location:...") without having to add header("Content-Type:") or header("Content-Type:text/html;charset=utf-8") to each one of them.
In short, what I'm saying that the mod_proxy_html's ProxyHTMLLinks causes a 302 Found to not be forwarded from the reverse proxy server to the client if a) the content-type is text/html (and thus ProxyHTMLLinks) applies, b) the charset is not set and c) your page has no content passed back.
In my opinion, this is a normal scenario. Pages which process form inputs often meet all three criteria.
It's possible that for some reason this is the intended functionality, and that I'm doing something else wrong, but I can't see what that would be. At least there is an elegant workout here in case anyone finds it useful.