Python 2.7 (although this manifested itself in Python 3 as well)
We have a database full of URLs of the form:
ftp://username1:password1@www.google.com/
ftp://username2:password2@www.google.com/
etc.
Should the passwords be encrypted and stored in a separate column? Probably.
One of our users recently changed the password of the batch job account to include a bracket. This is crashing our script whenever it tries to urlparse it. Urlparse interprets it as a malformed IPv6 address. I think this is a failure of urlparse to not respect the @ symbol, but I could be wrong.
Anyway we have a legacy system that was re-activated to handle this job, but it's not ideal. Any thoughts on how to handle this (other than change the password)? Are there any alternatives to urlparse?
The Python3 equiv has the same issue. I will go through the pain of upgrading to Python3 if I knew it would fix it.
In summary: Python behaves correctly. Your understanding of what should be the correct behavior is instead wrong.
The syntax of a URI is defined in RFC 4986. The relevant part about userinfo
(i.e. username
or username:password
) says clearly that no plain '[' is allowed inside the userinfo
:
authority = [ userinfo "@" ] host [ ":" port ]
userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
pct-encoded = "%" HEXDIG HEXDIG
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
As can be seen a '[' is neither part of unreserved
nor of sub-delims
. This means you have to encode this character with percent encoding, i.e. %5B
.