Search code examples
plonearchetypes

What is the canonical way to get text from RichText field with Archetypes


With Dexterity content types, the canonical way is to use the transformer:

from plone.app.textfield.interfaces import ITransformer
from plone.app.textfield.value import IRichTextValue


def get_text_field(obj):
    """Get text field in object on Dexterity."""
    transformer = ITransformer(obj)
    text = ''
    if IRichTextValue.providedBy(obj.text):  # Dexterity
        text = transformer(obj.text, 'text/plain')
    return text

But I can't find the canonical way to do it with Archetypes, the transformer didn't work with the raw html, just with RichTextValue object.

My approach now is to use lxml.html to convert html into text, but I don't know if it works like it should be:

def get_text_field(obj):
    """Get text field in object on both, Archetypes and Dexterity."""
    text = ''
    try:
        raw = obj.getText()  # Archetypes
        if raw != '':
            from lxml import html
            el = html.fromstring(raw)
            text = el.text_content()
    except AttributeError:
        from plone.app.textfield.value import IRichTextValue
        if IRichTextValue.providedBy(obj.text):  # Dexterity
            from plone.app.textfield.interfaces import ITransformer
            transformer = ITransformer(obj)
            text = transformer(obj.text, 'text/plain')
    return text

Solution

  • In Archetypes the regular getter does this for you.

    So if you call getText on a certain AT type which has a text field, you got the transformed values back: Check https://github.com/plone/Products.Archetypes/blob/e9ad0f4e76544b7890835ca93d25adeca4fc064f/Products/Archetypes/Field.py#L1564

    It uses the mimetype specified on the field.

    If the output type is text/html and you want text/plain.

    You can get by calling the field getter with the mimetype parameter:

    obj.getField('text').get(obj, mimetype='text/plain')
    

    Further: obj.getRawText returns the actual content, like obj.text.raw on a DX content with a RichTextValue.

    And you may check if the content provides IBaseObject instead of catch AttributeError.