I'm crawling pages and indexing them with appengine search api (Spanish and Catalan pages, with accented characters). I'm able to perform searches and make a page of results.
Problem arises when I try to use a query object with snipetted_fields, as it always generates a UnicodeEncodeError:
File "/home/otger/python/jobs-gae/src/apps/search/handlers/results.py", line 82, in find_documents
return index.search(query_obj)
File "/opt/google_appengine_1.7.6/google/appengine/api/search/search.py", line 2707, in search
apiproxy_stub_map.MakeSyncCall('search', 'Search', request, response)
File "/opt/google_appengine_1.7.6/google/appengine/api/apiproxy_stub_map.py", line 94, in MakeSyncCall
return stubmap.MakeSyncCall(service, call, request, response)
File "/opt/google_appengine_1.7.6/google/appengine/api/apiproxy_stub_map.py", line 320, in MakeSyncCall
rpc.CheckSuccess()
File "/opt/google_appengine_1.7.6/google/appengine/api/apiproxy_rpc.py", line 156, in _WaitImpl
self.request, self.response)
File "/opt/google_appengine_1.7.6/google/appengine/ext/remote_api/remote_api_stub.py", line 200, in MakeSyncCall
self._MakeRealSyncCall(service, call, request, response)
File "/opt/google_appengine_1.7.6/google/appengine/ext/remote_api/remote_api_stub.py", line 234, in _MakeRealSyncCall
raise pickle.loads(response_pb.exception())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 52: ordinal not in range(128)
I've found a similar question on stackoverflow: GAE Full Text Search development console UnicodeEncodeError but it says that it was a bug fixed on 1.7.0. I get same error either using version 1.7.5 and 1.7.6.
When Indexing pages I add two fields: description and description_ascii. If I try to generate snippets for description_ascii it works perfectly.
Is this possible to generate snippets of not ascii contents on dev_appserver?
I think this is a bug, reported new defect issue https://code.google.com/p/googleappengine/issues/detail?id=9335.
Temporary solution for dev server - locate google.appengine.api.search module (search.py), and patch function _DecodeUTF8 by adding inline if like this:
def _DecodeUTF8(pb_value):
"""Decodes a UTF-8 encoded string into unicode."""
if pb_value is not None:
return pb_value.decode('utf-8') if not isinstance(pb_value, unicode) else pb_value
return None
Workaround - until the issue is solved implement snippet functionality yourself - assuming field which is base for snippet is called snippet_base
:
query = search.Query(query_string=query_string,
options=
search.QueryOptions(
...
returned_fields= [... 'snippet_base' ...]
))
results = search.Index(name="<index-name>").search(query)
if results:
for res in results.results:
res.snippet = some_snippeting_function(res.field("snippet_base"))