I need to perform a very specific request on a legacy system and discovered that during a get request the http libraries are changing any %2C
back to a ,
.
Same issue with net-http, httparty, faraday, and open-uri using different implementations
2.5.0 :001 > require 'net/http'
=> true
2.5.0 :002 > require 'erb'
=> true
2.5.0 :003 > link = "http://example.com?q=" + ERB::Util.url_encode("Hello, World")
=> "http://example.com?q=Hello%2C%20World"
2.5.0 :004 > uri = URI(link)
=> #<URI::HTTP http://example.com?q=Hello%2C%20World>
2.5.0 :005 > res = Net::HTTP.get_response(uri)
=> #<Net::HTTPOK 200 OK readbody=true>
All this looks good until I look at the actual request with VCR
http_interactions:
- request:
method: get
uri: http://example.com/?q=Hello,%20World
body:
encoding: US-ASCII
string: ''
...
How do I keep the request to be http://example.com?q=Hello%2C%20World
?
,
is a legal character in a query (extended clarification is here https://stackoverflow.com/a/31300627/3820185)
ERB::Util.url_encode
instead does replace [^a-zA-Z0-9_\-.]
:
# File erb.rb, line 930
def url_encode(s)
s.to_s.dup.force_encoding("ASCII-8BIT").gsub(/[^a-zA-Z0-9_\-.]/n) {
sprintf("%%%02X", $&.unpack("C")[0])
}
end
So when doing the request, most probably the query is re-parsed to conform to the actual standards.
EDIT
Also you don't need to use ERB::Util.url_encode
at all, you can just pass your URL to URI
, it will propery escape it according to the standards.
irb(main):001:0> require 'net/http'
=> true
irb(main):002:0> link = URI 'http://example.com?q=Hello, World'
=> #<URI::HTTP http://example.com?q=Hello,%20World>
irb(main):003:0>