I have updated Apache today (to 2.4.56-1) and a load of .htaccess
rewrites that used to work are now getting AH10411 errors, relating to spaces in the query. I'm struggling for a 'proper' solution.
The user clicks on a link such as <a href='FISH%20J12345.6-78919'>clickme</a>
- as you can see the space in the link URL has been encoded as %20
.
The .htaccess
file in the relevant server directory contains and executes this relevant directive:
RewriteRule ^(FISH\s*J[0-9\.]+-?\+?[0-9]+)$ myPage.php?sourceName=$1 [L,QSA]
(In the above I am checking for spaces, not %20
, as the browser seems to be converting it to space before it makes it to this rule).
This was working until I updated Apache; now users get a 403 error, and my Apache error log reports:
AH10411: Rewritten query string contains control characters or spaces
This appears to be a new error, because Googling it finds nothing!
Editing my pages to (for example) change the space to an underscore and handle it correctly is not really an option, as the design is intended to support users being able to enter a URL directly using the name of the object they care about. So far, the only workaround I've found is a bit ugly, namely capturing the two parts of the source name separately in the regexp, thus:
RewriteRule ^(FISH)\s*(J[0-9\.]+-?\+?[0-9]+)$ myPage.php?sourceName=$1+$2 [L,QSA]
^ ^ ^^^
(I tried $1%20$2
at the end, which also resulted in the same error.)
Is there a better solution for this? i.e. how am I "supposed" to handle the case of spaces in a URL, when it's in a string I want to capture and pass as an argument to the underlying page?
(I tried $1%20$2 at the end, which also went badly).
This looks like a bug. Encoding the space as %20
in the query string should be valid. You can also encode the space as +
in the query string (as in your workaround).
In your original rule, Apache should be encoding the space (as %20
) when making the internal rewrite (since a literal space is not valid in the URL). However, it would seem Apache is then baulking at the encoded space (or not auto-encoding the URL in the rewrite)?!
You can try using the B
flag in your original rule. The B
flag tells mod_rewrite to URL-encode the backreference before applying this to the substitution string. However, this would seem to be dependent on Apache encoding the space as +
in the query string (as opposed to %20
which it would ordinarily do). Certainly in earlier versions of Apache, this would only have resulted in Apache encoding the space as %20
(not +
), however, since version 2.4.26 Apache has introduced a new flag BNP
(backrefnoplus
) which explicitly informs Apache not to use a +
, so you would think that by default, it would use a +
. (Unfortunately I can't just test this myself at the moment.)
For example:
RewriteRule ^(FISH\s*J[\d.]+-?\+?\d+)$ myPage.php?sourceName=$1 [B,QSA,L]
(Minor point... no need to backslash-escape the literal dot when used inside a regex character class. I also reduced the digit ranges to the shorthand \d
.)
Aside: Can you have both -
and +
at the same time before the last set of digits (denoted by the subpattern -?\+?
)? It looks like it should perhaps be one or the other (or nothing at all)? eg. [-+]?
.
Is there a better solution for this? i.e. how am I "supposed" to handle the case of spaces in a URL, when it's in a string I want to capture and pass as an argument to the underlying page?
Not really (although your solution is not strictly correct - see below). In your particular example, that only contains spaces you shouldn't need to do anything, as mod_rewrite should automatically URL-encode any URL that is not valid. (There is an NE
- noescape
- flag to explicitly prevent mod_rewrite from doing this - which is sometimes necessary to prevent already encoded characters being doubly encoded.) You can always use the B
flag in URL-rewrites of this form (as mentioned above). You would need to use the B
flag if there were other special characters, such as &
(a special character in the query string) which would not otherwise be URL-encoded (effectively resulting in the URL parameter value being truncated).
RewriteRule ^(FISH)\s*(J[0-9\.]+-?\+?[0-9]+)$ myPage.php?sourceName=$1+$2 [L,QSA]
An issue with your solution is that you are allowing 0 (ie. "none") or more spaces in the request and enforcing a single space in the resulting URL parameter. This is not the same as your original directive, that would preserve the spaces (or lack of) from the original request.
Could there be 0 or more spaces in the initial request?
If yes, and these need to be preserved then it may just be easier to repeat this rule for as many "spaces" as you need. You could implement a search/replace, but that may be overkill.
(In the above I am checking for spaces, not %20, as the browser seems to be converting it to space before it makes it to this rule).
The URL-path that the RewriteRule
pattern matches against is first URL-decoded (%-decoded), which is why you need to match against a literal space and not %20
. This has nothing to do with the "browser". Any literal spaces in the URL-path "must" be URL-encoded as %20
in the HTTP request that leaves the browser/user-agent otherwise it's simply not valid.
There was a comment (since deleted) where the user was also passing a +
(literal plus) in the URL-path and seemingly expecting this to be passed as-is to the query string (via an internal rewrite) which would then be seen as an encoded space. However, the use of the B
flag (as above) would result in the literal +
being URL encoded as %2b
thus preserving the literal +
- which would ordinarily be the correct behaviour. However, if the +
should be copied as-is and thus seen as an encoded space (not a literal +
) in the resulting query string then you can restrict the non-alphanumeric characters that the B
flag will encode (requires Apache 2.4.26+). ie. Exclude the +
.
For instance, you could limit the encoding to spaces and ?
only. For example:
RewriteRule ^(.+)$ index.php?query=$1 "[B= ?,L]"
+
will no longer be encoded in the backreference, so its special meaning in the query string (as an encoded space) will still apply.
NB: You can't encode only spaces (since a space cannot be used as the last character in the B
flag value argument), hence the additional ?
character. Consequently, the flags argument needs to be surrounded in double quotes, since spaces are otherwise argument delimiters.
Reference: