Search code examples
pythoncurlpostsolrurllib

Why does Python's urllib.request.urlopen send POST data as query string?


curl correctly posts data to Solr:

$ curl -v 'http://solr.example.no:12699/solr/my_coll/update?commit=true' \
--data '<add><doc><field name="key">KEY__9927.1</field><field name="value">\
{"result":0,"jobId":"9459695","jobNumber":"9927.1"}</field></doc></add>'

The solr query log says:

[20200306T111354,131] [my_coll_shard1_replica_n85]  webapp=/solr path=/update params={commit=true} status=0 QTime=96

I'm trying to do the same thing with Python:

>>> import urllib.request
>>> data = '<add><doc><field name="key">KEY__9927.1</field><field name="value">{"result":0,"jobId":"9459695","jobNumber":"9927.1"}</field></doc></add>'
>>> url = 'http://solr.example.no:12699/solr/my_coll/update?commit=true'
>>> req = urllib.request.Request(url=url, data=data.encode('utf-8'), method='POST')
>>> res = urllib.request.urlopen(req)

But now the solr query log shows that the POST data has been added to the query param string:

[20200306T112358,780] [my_coll_shard1_replica_n87]  webapp=/solr path=/update params={commit=true&<add><doc><field+name="key">KEY__9927.1</field><field+name%3D"value">{"result":0,"jobId":"9459695","jobNumber":"9927.1"}</field></doc></add>} status=0 QTime=30

What is happening here?


Solution

  • The issue is that you're not sending the correct Content-Type for your request, and this gets mangled within Jetty (or the Solr app) before being forwarded to the log (any POSTed data that's not multipart can be inserted as part of the query string - Solr parses them all the same). The /update endpoint accepts multiple formats, such as both JSON and XML, and the Content-Type should be set appropriately.

    req = urllib.request.Request(url=url, data=data.encode('utf-8'), method='POST', headers={'Content-Type': 'text/xml'})
    res = urllib.request.urlopen(req)
    

    In fact, it's the User-Agent string that changes the behaviour. This is by design - curl has been special cased to override the default Content-Type handler. If you're not using curl, you have to explicitly provide the Content-Type being submitted. This has probably been done to make it easier to make manual requests using curl on the command line. The implementation is available in SolrRequestParsers.java, line 782