Search code examples
apacheproxyredhatmod-proxymod-proxy-html

mod_proxy_html configuration - truncation issues


I've looked around as much as I can, and I've hit the point where I'm completely stumped.

I'm running a RedHat server with Apache on top, which I'm using as a proxy to sit between the outside world and two other application servers, with completely different bases (one IIS, one Linux).

Both of these servers have correct internal network URLs, that the applications residing on them understand. The apps (DotNetNuke and WordPress derived) both generate HTML 5 pages, which contain the correct / appropriate markup, and render correctly outside of the proxy (i.e. on the internal network).

When passing these pages through the proxy however, the result seems to be missing characters at the end of CSS and JavaScript files.

So (in practice), JavaScript code like this:

... {return f})})(window);

or CSS like this:

...
background-position:center left;
background-repeat:no-repeat;
}

...turns into code like this:

... {return f})})(window

or like this:

...
background-position:center left;
background-repeat:no-re

The proxy setup is using the mod_proxy and mod_proxy_html Apache modules - and I'm fairly certain that the problem I'm encountering is to do with the configuration of mod_proxy_html, which currently looks like this:

ProxyHTMLEnable On
ProxyHTMLBufSize  102400
ProxyHTMLExtended On
ProxyHTMLStripComments Off
ProxyHTMLDocType "<!DOCTYPE html>"
ProxyHTMLMeta Off
#ProxyHTMLLogVerbose On
#LogLevel Debug

<Location /xxxxx>
        ProxyPass               http://www.example.com
        ProxyPassReverse        http://www.example.com
        ProxyHTMLURLMap         http://www.example.com /xxxxx
        ProxyHTMLURLMap         / /xxxxx/
</Location>

<Location />
        ProxyPass               http://10.11.0.51/
        ProxyPassReverse        http://10.11.0.51/
</Location>

Going through the Apache docs here: http://httpd.apache.org/docs/2.4/en/mod/mod_proxy_html.html - doesn't give any immediate clues however.

Has anyone come across the same issue? Or is there something quick that I'm missing?

Any help would be gratefully received!

Updated:

Ultimately, the problem appeared to be the default behaviour of mod_proxy_html in parsing all content as UTF-8 encoded (when some of the content wasn't - which couldn't be amended, despite best efforts).

To this end, after a bit of work, mod_substitute was used instead (just parsing text as text, ignoring file encoding) alongside a cache solution to speed up load times.

Shame mod_proxy_html didn't work for this project - but a way was found to do it in the end!


Solution

  • The problem IS related to a bug within mod_proxy_html (s. http://apache-http-server.18135.x6.nabble.com/PATCH-mod-xml2enc-eats-end-of-file-td5001104.html)

    I was able to verify the described behavior (see apache´s error log with loglevel debug) and the patch works for me - although its approach is not a perfect solution.

    Versions: mod_xml2enc (1.0.4) libxml2 (2.7.6-0.9.1)