Search code examples
pyqt5semantic-markuptaggingqtexteditqtextdocument

Tagging specific parts of QTextDocument


I have to edit a document which has been tagged semantically.

Assume I have an HTML document where some or all paragraphs (or span) have been tagged with a specific class name, something like: <p class="bio"><span class="name">John</span><span class="surname">Doe</span>is a <span class="job">carpenter</span> living in <span class="place">York</span>.</p><p class="story">He was working at his bench when...</p>

I want to use a QTextEdit widget to edit such text (if possible).

Additional requirements are:

  • Each class should have specific graphic rendering (this should be easy using CSS).
  • Editing specific <span> should preserve class (i.e.: if I edit "John" -> "Jonathan" it should still have class="name").
  • I should be able to apply class to specific pieces of text (i.e.: select some text, open a context menu and select one of the possible classes).
  • Remove tagging from selection.
  • Serialize edited text (i.e.: walk the edited text, recognize class changes and be able to produce whatever markup I want).
  • Note classes can be contained one inside another (but not overlap partially); this means some piece of code has two (or more) classes.

Can this be achieved with standard means?

As far as I have seen QTextDocument and associated classes (e.g.: QTextFrame, QTextFormat, etc.) are geared toward visual representation (font style, color etc.) while I need some "logic" tagging that may or may not reflect in visual changes. I mean: text can be all in the same font/color/background, but moving cursor over it I should be able to list all classes active in that specific place (if any).

I am coding in PyQt5, if this is relevant.

The only (rather ugly!) way I seem to see to achieve this is to use QTextCharFormat's tooltip property to store class(es) of each QTextFragment. Is there a better option?


Solution

  • For anyone having the same problem:

    QTextCharFormat has a property (named "Property") which can be used to hold arbitrary data.

    You need to:

    • define your set of codes (higher than QtGui.QTextFormat.UserProperty to avoid clash with existing properties).
    • set with: format.setProperty(mycode, myvalue)
    • read back with: value = format.property(mycode)

    Other Widgets have similar (but NOT identical!) mechanisms (e.g.: QStandardItem has a similar property called data)

    IMPORTANT NOTE: if you are using PyQt there are severe restrictions in what you can store and safely retrieve (storing a QTextDocument in a QStandardItem.setData(doc, mycode) will not work reliably because only the reference will be stored and if the underlying python object is garbage collected you'll have a nice crash (SIGSEGV).