I have a bit of a bizzare problem. I have a solr index, which I query using curl like so:
curl 'http://localhost:8984/solr/my_index/select?indent=on&q="galvin%20life%20sciences"~0&wt=json&sort=_docid_%20desc&rows=5'
and I get (note the q
string and the tilde operator which I use for proximity search):
{
"responseHeader":{
"status":0,
"QTime":1,
"params":{
"q":"\"galvin life sciences\"~0",
"indent":"on",
"sort":"_docid_ desc",
"rows":"5",
"wt":"json"}},
"response":{"numFound":61,"start":0,"numFoundExact":true,"docs":[
Now, I am trying to replicate the same thing in python using:
resp=requests.get('http://localhost:8984/solr/my_index/select?q=' + "galvin%20life%20sciences"+"~0" + '&wt=json&rows=5&start=0&fl=id,org*,score')
and I get this:
[
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "galvin life sciences~0",
"fl": "id,org*,score",
"start": "0",
"rows": "5",
"wt": "json"
}
},
"response": {
"numFound": 3505398,
"start": 0,
"maxScore": 9.792607,
"numFoundExact": true,
"docs": [
YOu can see that the queries are somehow different:
curl: "q":"\"galvin life sciences\"~0",
requests: "q": "galvin life sciences~0",
so I am getting wrong results when using requests.
I am not sure what I should do in requests to make the queries match.
I have tried the solution of @Mats:
requests.get('http://localhost:8984/solr/my_index/select', params={
'q': '"galvin life sciences"~0',
'wt': 'json',
'rows': 5,
'start': 0,
'fl': 'id,org*,score',
})
but now I am not able to pass the variable to it (how annoying). So I have:
q_solr="Galvin life sciences"
requests.get('http://localhost:8984/solr/my_index/select', params={
'q': q_solr+'~0',
'wt': 'json',
'rows': 5,
'start': 0,
'fl': 'id,org*,score',
})
but this gives me no result.. WTAF!!!!
You can either use requests
built-in support for creating URL parameters for you (which is what I'd recommend, as it lets you properly separate the parameters and requests handles escaping for you):
requests.get('http://localhost:8984/solr/my_index/select', params={
'q': '"galvin life sciences"~0',
'wt': 'json',
'rows': 5,
'start': 0,
'fl': 'id,org*,score',
})
Otherwise you can build the URL yourself as you've done, but since you've concatenated the strings instead of having "
inside the previous string, you've just merged q=
with galvin ..
instead of "galvin
. There's no need to end the previous string if the next one is included anyways. You can also use a backslash to escape any quotes inside a string if necessary.
resp=requests.get('http://localhost:8984/solr/my_index/select?q="galvin%20life%20sciences"~0&wt=json&rows=5&start=0&fl=id,org*,score')
But use the first form unless you're getting a preformatted URL from a different source.