I would like to query Solr mlt terms in Python in an efficient way. I have a list of full names e.g.:
names = ['Bobby Johnson', 'James Bob']
To query the mlt terms of each individual person in solr you would have to use the following URLs:
'http://localhost:8382/solr/core/mlt?q=name:"Bobby Johnson"&fl=*,score&mlt.fl=concepts&mlt.interestingTerms=details'
'http://localhost:8382/solr/core/mlt?q=name:"James Bob"&fl=*,score&mlt.fl=concepts&mlt.interestingTerms=details'
As you can see in the above example, the query of the full name with the white space is represented inside quotations. And this works, except it is repetitive work since the list of names is large.
If I try to do it more efficiently, by querying every item in the list in a for-loop using f-strings, I get an Invalid URL error (see below). My code:
from urllib.request import urlopen
for name in names:
req = urlopen(f'http://localhost:8382/solr/core/mlt?q=name:"{name}",score&mlt.fl=concepts&mlt.interestingTerms=details')
request_json = json.load(req)
interesting_terms = request_json['interestingTerms']
print(interesting_terms)
#Error message:
InvalidURL: URL can't contain control characters. '/solr/core/mlt?q=name:"Bobby Johnson",score&mlt.fl=concepts&mlt.interestingTerms=details' (found at least ' ')
Any specific ideas/examples on how to deal with multiple requests in Python, when the query contains a white space?
Desired output: being able to send a request for every full name in the list and returning the information in json format.
You have to escape the value when generating the URL before sending it into urlopen:
from urllib.request import urlopen
from urllib.parse import quote_plus
for name in names:
req = urlopen(f'http://localhost:8382/solr/core/mlt?q=name:"{quote_plus(name)}",score&mlt.fl=concepts&mlt.interestingTerms=details')
...