Search code examples
apache.htaccessmod-rewrite

Using mod_rewrite with variables (for proxying in htaccess)?


While there are a lot of questions dealing with this topic, I still cannot wrap my head around this particular case... So here is a generic example, that should illustrate my problem:

So, I have an Apache server, that does some proxying. My "application" is in htdocs/mydir, which is where the .htaccess is, as well.

I have the following rule in the .htaccess, that works:

RewriteCond %{REQUEST_URI} ^/mydir/pics/first(/.*)? [OR]
RewriteCond %{REQUEST_URI} ^/mydir/pics/second(/.*)? [OR]
RewriteCond %{REQUEST_URI} ^/mydir/pics/third(/.*)?
RewriteRule ^pics/(.*)$ http://192.168.7.13:5055/$1 [P,L]

So, when I trigger the address http://localhost/mydir/pics/first/IMG12.jpg in my browser, Apache gets this request proxied/forwarded to http://192.168.7.13:5055/IMG12.jpg, which is what I'd want in this case. (Note that there could be 3rd level subdirectories other than "first", "second" or "third", which I would not like processed as above).

Now, the thing is, that I have several "subdirectories" under mydir that I'd like to handle besides pics - say, also videos, drawings, songs etc. So, I'd like to avoid repeating all of the above rules for each and every subdirectory, and use some kind of a variable instead, that will be extracted from the request.

I'm aware that I can capture the subdirectory portion in an environment variable - this rule also works for me, and sets the SUBDIR variable to whatever the right subdirectory is in the request:

RewriteCond %{REQUEST_URI} ^/mydir/(pics|videos|drawings|songs)(?:/.*)?
RewriteRule ^ - [E=SUBDIR:%1]

Now, the problem is - how could I use this variable as a stand-in for the verbatim subdirectory in the proxying RewriteRule? As noted in other questions like What Double Colon does in RewriteCond? , %N backreference inside RewriteCond - this does not work:

RewriteCond %{REQUEST_URI} ^/mydir/(pics|videos|drawings|songs)(?:/.*)?
RewriteRule ^ - [E=SUBDIR:%1]

RewriteCond %{REQUEST_URI} ^/mydir/%{SUBDIR}/first(/.*)? [OR]
RewriteCond %{REQUEST_URI} ^/mydir/%{SUBDIR}/second(/.*)? [OR]
RewriteCond %{REQUEST_URI} ^/mydir/%{SUBDIR}/third(/.*)?
RewriteRule ^%{SUBDIR}/(.*)$ http://192.168.7.13:5055/$1 [P,L]

... because of limitations of what can be used in left-hand-side vs. right-hand-side of RewriteRule vs RewriteCond (which I still cannot understand fully).

However, looking at this example - would it somehow be possible to rewrite the above, so a variable is used to extract the 2nd level subdirectory, and it is then used to match in the RewriteRule?


Solution

  • when I trigger the address http://localhost/mydir/pics/first/IMG12.jpg in my browser, Apache gets this request proxied/forwarded to http://192.168.7.13:5055/IMG12.jpg

    To clarify, according to the directives you've posted, it would forward the request to http://192.168.7.13:5055/first/IMG12.jpg, not simply to /IMG12.jpg.

    You could just do this in a single rule, the additional conditions would seem to be superfluous. For example:

    RewriteRule ^(pics|videos|drawings|songs)/(first|second|third)/(.*) http://192.168.7.13:5055/$2/$3 [P]
    

    However, there is a potential issue with this as you are not currently passing the first directory through to the target. So, for instance, you couldn't have a file in the pics subdirectory the same as one in the drawings directory, since they proxy to the same file.

    I'm assuming you wouldn't want to proxy a request for /pics/first?

    Is it really the intention to match anything after /pics/first/.....? eg. Would /pics/first/foo/bar/baz.jpg be a valid request?

    The L flag is not strictly required with P since it is implied.


    Aside:

    ^/mydir/pics/first(/.*)?

    The trailing (/.*)? isn't actually doing anything in this regex, since it's entirely optional. So, the above will match /mydir/pics/first, /mydir/pics/firstanything and /mydir/pics/first/anything. (And the result of the captured group is not used.)

    because of limitations of what can be used in left-hand-side vs. right-hand-side of RewriteRule vs RewriteCond (which I still cannot understand fully)

    You can't use a construct of the form %{VAR} (or %{ENV:VAR}) inside a regular expression because it conflicts with the syntax of the regex engine (PCRE). There would need to be a pre-process "variable substitution" that occurs before the regex is compiled - but that does not happen.


    UPDATE:

    "You can't use a construct of the form..." - indeed, but I've seen examples where backreferences like $1, %1 or \1 are used, so I was hoping someone would help me in how could I use those to "propagate" %{ENV:VAR} to the regex (if at all possible).

    You won't have seen $1 or %1 used in the CondPattern or RewriteRule pattern arguments (both regex by default), for the same reason as mentioned above. However, you can use \1 - this is an internal backreference and is part of the regex (PCRE) syntax.

    So, as a purely contrived academic exercise, you could do something like the following (extending your example):

    # Sets SUBDIR env var to one of the allowed subdirectories
    # ...but only when that subdirectory is requested
    # ...otherwise SUBDIR is "empty"
    RewriteRule ^(pics|videos|drawings|songs)(/|$) - [E=SUBDIR:$1]
    
    # Test that one of the stated SUBDIR is present and proxy the request on success
    RewriteCond %{REQUEST_URI}@@%{ENV:SUBDIR} ^/mydir/([^/]+)/first(/.*)?@@\1 [OR]
    RewriteCond %{REQUEST_URI}@@%{ENV:SUBDIR} ^/mydir/([^/]+)/second(/.*)?@@\1 [OR]
    RewriteCond %{REQUEST_URI}@@%{ENV:SUBDIR} ^/mydir/([^/]+)/third(/.*)?@@\1
    RewriteRule ^([^/]+)/(.*)$ http://192.168.7.13:5055/$2 [P]
    

    The same principle applies as with any RewriteCond directive. In the case of the first condition, you have the TestString %{REQUEST_URI}@@%{ENV:SUBDIR} that must match the regex ^/mydir/([^/]+)/first(/.*)?@@\1 for the condition to be successful.

    The @@ is just an arbitrary string that does not occur elsewhere in the string to be matched, so we can use it as a delimiter between the value of REQUEST_URI and SUBDIR.

    The internal backreference \1, at the end of the regex, references the first captured group in the regex. ie. whatever ([^/]+) matches. And this must match the value of %{ENV:SUBDIR} in the TestString.

    For example, given a request for /mydir/pics/first/img.jpg then SUBDIR is set to pics (the .htaccess file is in the /mydir subdirectory). So the TestString (when expanded) becomes /mydir/pics/first/img.jpg@@pics. The \1 backreference contains pics from the second path segment, so this matches the TestString ...@@pics and the condition is successful.

    On the other hand, if the request was for /mydir/foo/first/img.jpg then SUBDIR will not be set (effectively empty) and the TestString becomes /mydir/foo/first/img.jpg@@. However, the backreference \1 contains foo so fails to match the empty gap at the end of the TestString. ie. ...@@foo (regex) does not match ...@@ (TestString) and the condition fails.

    Just to stress, this is a very contrived example, you wouldn't use this method in this case. For instance, the very fact that SUBDIR is set at all means that one of the stated subdirectories has been accessed, so the (complex) conditions are superfluous. And since the allowed subdirectories are known (and there are only a few) then you might as well just put the regex alternation (ie. (pics|videos|drawings|songs)) directly in the rule, as I did in the example at the top of this answer.