Search code examples
htmlxmldomdomparser

How is the DOM parsed?


Possible Duplicate:
If you're not supposed to use Regular Expressions to parse HTML, then how are HTML parsers written?

My question is simple: How do current DOM parsers actually parse the DOM from a string (XML, HTML, or otherwise)?

I know you shouldn't parse html with RegEx, but couldn't a DOM parser use RegEx to match patterns for open/close tags? Or, is there a good once-over algorithm for parsing the provided string as a character array?


Solution

  • Look at this:

    alt text

    Here is a good Example