When path contains %25
, flask seems to be mutating the incoming path to treat %25
as %
instead of preserving the original request path. Here are the request and path variable:
GET http://localhost:5000/Files/dir %a/test %25a.txt
request.base_url
: http://localhost:5000/Files/dir%20%25a/test%20%25a.txt
127.0.0.1 - - [14/Feb/2023 12:00:49] "GET /Files/dir%20%a/test%20%25a.txt HTTP/1.1" 200 -
Specifically the test %25a.txt
seems to be encoded as test%20%25a.txt
instead of test%20%2525a.txt
.
%25
is not allowed to be in url paths (Ref: In URL `%` is replaced by `%25` when using `queryParams` while routing in Angular).%25
indeed not allowed to be in the request path ?%25
what would be a good way to handle this ?https://www.rfc-editor.org/rfc/rfc7230 § 2.7 explains
that the path is comprised of
pchar
s,
which (roughly) are unreserved
or pct-encoded
.
Your favorite character definitely does not fall into this
or the similar delim category:
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
So that leaves us with a percent-encoded %25
,
which the spec
treats further.
Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI. Implementations must not percent-encode or decode the same string more than once, as decoding an already decoded string might lead to misinterpreting a percent data octet as the beginning of a percent-encoding, ...
And that is where things went south for you.
Now, one can tilt at windmills until Don Quixote brings the cows home, but the fact of the matter is that software is made of bugs, and they can be hard to isolate and get folks to fix.
The usual Pragmatic approach to sending a "forbidden" character such as percent is to disguise it as it makes it way through a software stack. Here's two common techniques.
~
tilde. Map percent to tilde and vice-versa. Prohibit tilde in pathnames, or use percent-encoded %7E for it.This tends to leave your URLs a bit uglier, a bit less informative,
than they would have been.
Given a pathname p
, either it contains a percent or it doesn't.
Prepend 0
if it doesn't, and now it survives untouched, in a form that can be grep
'd.
Prepend 1
if it does, and then use base64 or whatever.
Strip the leading digit on the other end and process appropriately.