Search code examples
regexperlsgml

How to extract textual content from an SGML DTD using Perl?


I'm looking into extracting all the content from a DTD using Perl, but I'm not sure which is the best way to go about it. I know there are modules for working with XML, but I'm not sure if there are any for this type of work with SGML or if I should try to create a regular expression for this work?

I'm new to SGML and Perl along with not having much experience with regex, except for very simple pattern matching.


Solution

  • You have 2 options here:

    • use the old perlSGML distribution which I have used in the (remote!) past. This being perl it should still run on modern perl,

    • convert your SGML to XML using osx, which is part of openSP, available for at least Debian/Ubuntu (the package is called opensp)and most likely other platforms, then use XML tools like XML::LibXML, or XML::Twig

    There are a lot more XML tools than SGML tools these days, but of course you may loose some information since DTDs are slightly simpler in XML than in SGML