We're using Django MarkupField to store Markdown text and it works quite well.
However, when we try to index these fields in Wagtail we get serialization errors from Elasticsearch, like this:
File "/usr/local/lib/python3.5/dist-packages/wagtail/wagtailsearch/management/commands/update_index.py", line 120, in handle
self.update_backend(backend_name, schema_only=options.get('schema_only', False))
File "/usr/local/lib/python3.5/dist-packages/wagtail/wagtailsearch/management/commands/update_index.py", line 87, in update_backend
index.add_items(model, chunk)
File "/usr/local/lib/python3.5/dist-packages/wagtail/wagtailsearch/backends/elasticsearch.py", line 579, in add_items
bulk(self.es, actions)
File "/usr/local/lib/python3.5/dist-packages/elasticsearch/helpers/__init__.py", line 195, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/local/lib/python3.5/dist-packages/elasticsearch/helpers/__init__.py", line 162, in streaming_bulk
for bulk_actions in _chunk_actions(actions, chunk_size, max_chunk_bytes, client.transport.serializer):
File "/usr/local/lib/python3.5/dist-packages/elasticsearch/helpers/__init__.py", line 61, in _chunk_actions
data = serializer.dumps(data)
File "/usr/local/lib/python3.5/dist-packages/elasticsearch/serializer.py", line 50, in dumps
raise SerializationError(data, e)
elasticsearch.exceptions.SerializationError: ({'_partials': [<markupfield.fields.Markup object at 0x7faa6e238e80>, <markupfield.fields.Markup object at 0x7faa6dbc4da0>], 'pk': '1', 'research_interests': <markupfield.fields.Markup object at 0x7faa6e238e80>, 'bio': <markupfield.fields.Markup object at 0x7faa6dbc4da0>}, TypeError("Unable to serialize <markupfield.fields.Markup object at 0x7faa6e238e80> (type: <class 'markupfield.fields.Markup'>)",))
One workaround is to index callables that return field.raw
but then we'd have to write one such callable for each and every Markdown field property we have in our models. I thought we could get around this by extending the field property (i.e., the django-markupfield Markup
class that replaces the MarkupField
) with a get_searchable_content(value)
method but the serialization errors persist.
Does anyone have any tips for indexing custom Django fields in Wagtail + elasticsearch?
I was putting the get_searchable_content
in the wrong place, I thought it was needed in the Markup
class but instead it needs to be placed on the Django model Field
class itself. Wagtail will then pull the appropriate value to be indexed in elasticsearch (or any other search backend).
The most straightforward solution was to extend MarkupField
with a custom Field class and add a get_searchable_content(self, value) that delegates its implementation to MarkupField.get_prep_value.