Search code examples
python-3.xdxfezdxf

How do I return the string within 'MTEXT' with ezdxf?


I am using the below code to return a string from all TEXT items within a .dxf

    for i in m_space.query('TEXT'):
        return(str(i.dxf.text))

This is working well so I would like to do the same for all MTEXT items. From reading the docs I have put together the below;

    for i in m_space.query('MTEXT'):
        return(str(i.text))

But the output seems to include some additional data. I could use some regex to get the text I need but would like to know if there is a better way built into ezdxf

>>>   '{\\Fsimplex|c0;TEXT THAT I WANT}'

Solution

  • The additional information that you are seeing within the MText content is MText formatting codes.

    When formatting overrides are applied through the MText editor (as opposed to being applied to the Text Style referenced by the MText object), the formatting is encoded using formatting codes embedded within the text content. Such formatting codes are not visible in AutoCAD, but are used to appropriately render the various sections of the text content enclosed by the code - in your case, the formatting code:

    {\\Fsimplex|c0;TEXT THAT I WANT}
    

    Results in the string TEXT THAT I WANT being displayed using the simplex font.

    As far as I'm aware, does not include methods which will allow you obtain the text content with all formatting codes removed, but upon obtaining the content using the text property, you can then use Regular Expressions to remove such codes.

    To offer an existing example, I've previously developed the following AutoLISP function which uses Regular Expressions to remove all formatting codes, but there are likely other ways to phrase the RegEx patterns and obtain the same result:

    ;; Quick Unformat  -  Lee Mac
    ;; Returns a string with all MText formatting codes removed.
    ;; rgx - [vla] Regular Expressions (RegExp) Object
    ;; str - [str] String to process
    
    (defun LM:quickunformat ( rgx str )
        (if
            (null
                (vl-catch-all-error-p
                    (setq str
                        (vl-catch-all-apply
                           '(lambda nil
                                (vlax-put-property rgx 'global     actrue)
                                (vlax-put-property rgx 'multiline  actrue)
                                (vlax-put-property rgx 'ignorecase acfalse) 
                                (foreach pair
                                   '(
                                        ("\032"     . "\\\\\\\\")
                                        (" "        . "\\\\P|\\n|\\t")
                                        ("$1"       . "\\\\(\\\\[ACcFfHKkLlOopQTW])|\\\\[ACcFfHKkLlOopQTW][^\\\\;]*;|\\\\[ACcFfKkHLlOopQTW]")
                                        ("$1$2/$3"  . "([^\\\\])\\\\S([^;]*)[/#\\^]([^;]*);")
                                        ("$1$2"     . "\\\\(\\\\S)|[\\\\](})|}")
                                        ("$1"       . "[\\\\]({)|{")
                                        ("\\$1$2$3" . "(\\\\[ACcFfHKkLlOoPpQSTW])|({)|(})")
                                        ("\\\\"     . "\032")
                                    )
                                    (vlax-put-property rgx 'pattern (cdr pair))
                                    (setq str (vlax-invoke rgx 'replace str (car pair)))
                                )
                            )
                        )
                    )
                )
            )
            str
        )
    )
    

    For your sample text string, the above would return:

    _$ (setq rgx (vlax-create-object "vbscript.regexp"))
    #<VLA-OBJECT IRegExp2 00000000315de460>
    _$ (LM:quickunformat rgx "{\\Fsimplex|c0;TEXT THAT I WANT}")
    "TEXT THAT I WANT"
    _$ (vlax-release-object rgx)
    0