Search code examples
full-text-searchplonezopedexterity

Custom SearchableText and HTML fields in Plone


I am writing a Dexterity content type which contains plain text and HTML fields. I want to have a custom SearchableText() method which exposes these fields to portal_catalog and Plone full text search.

I assume for plain text I can just do string join with spaces. But how I should preprocess HTML content when exposing it in SearchableText()?


Solution

  • for converting data in plone there is a tool called portal_transforms, which is quite intelligent in converting stuff (depending on your os / installation it may also be able to convert .doc, .pdf etc.):

    from Products.CMFCore.utils import getToolByName
    transforms = getToolByName(self.context, 'portal_transforms')
    stream = transforms.convertTo('text/plain', html, mimetype='text/html')
    text = stream.getData().strip()
    

    for indexing fields in dexterity I propose to use collective.dexteritytextindexer (but there is no TTW support at the moment). -> http://pypi.python.org/pypi/collective.dexteritytextindexer -> https://github.com/collective/collective.dexteritytextindexer

    cheers