I am using the below code to return a string from all TEXT items within a .dxf
for i in m_space.query('TEXT'):
return(str(i.dxf.text))
This is working well so I would like to do the same for all MTEXT items. From reading the docs I have put together the below;
for i in m_space.query('MTEXT'):
return(str(i.text))
But the output seems to include some additional data. I could use some regex to get the text I need but would like to know if there is a better way built into ezdxf
>>> '{\\Fsimplex|c0;TEXT THAT I WANT}'
The additional information that you are seeing within the MText content is MText formatting codes.
When formatting overrides are applied through the MText editor (as opposed to being applied to the Text Style referenced by the MText object), the formatting is encoded using formatting codes embedded within the text content. Such formatting codes are not visible in AutoCAD, but are used to appropriately render the various sections of the text content enclosed by the code - in your case, the formatting code:
{\\Fsimplex|c0;TEXT THAT I WANT}
Results in the string TEXT THAT I WANT
being displayed using the simplex
font.
As far as I'm aware, ezdxf does not include methods which will allow you obtain the text content with all formatting codes removed, but upon obtaining the content using the text
property, you can then use Regular Expressions to remove such codes.
To offer an existing example, I've previously developed the following AutoLISP function which uses Regular Expressions to remove all formatting codes, but there are likely other ways to phrase the RegEx patterns and obtain the same result:
;; Quick Unformat - Lee Mac
;; Returns a string with all MText formatting codes removed.
;; rgx - [vla] Regular Expressions (RegExp) Object
;; str - [str] String to process
(defun LM:quickunformat ( rgx str )
(if
(null
(vl-catch-all-error-p
(setq str
(vl-catch-all-apply
'(lambda nil
(vlax-put-property rgx 'global actrue)
(vlax-put-property rgx 'multiline actrue)
(vlax-put-property rgx 'ignorecase acfalse)
(foreach pair
'(
("\032" . "\\\\\\\\")
(" " . "\\\\P|\\n|\\t")
("$1" . "\\\\(\\\\[ACcFfHKkLlOopQTW])|\\\\[ACcFfHKkLlOopQTW][^\\\\;]*;|\\\\[ACcFfKkHLlOopQTW]")
("$1$2/$3" . "([^\\\\])\\\\S([^;]*)[/#\\^]([^;]*);")
("$1$2" . "\\\\(\\\\S)|[\\\\](})|}")
("$1" . "[\\\\]({)|{")
("\\$1$2$3" . "(\\\\[ACcFfHKkLlOoPpQSTW])|({)|(})")
("\\\\" . "\032")
)
(vlax-put-property rgx 'pattern (cdr pair))
(setq str (vlax-invoke rgx 'replace str (car pair)))
)
)
)
)
)
)
str
)
)
For your sample text string, the above would return:
_$ (setq rgx (vlax-create-object "vbscript.regexp"))
#<VLA-OBJECT IRegExp2 00000000315de460>
_$ (LM:quickunformat rgx "{\\Fsimplex|c0;TEXT THAT I WANT}")
"TEXT THAT I WANT"
_$ (vlax-release-object rgx)
0