Search code examples
regexparsinghtml-parsinghtml-parser

Creating my own html parser


I know this post, I've already read it but still I'd like to learn what language does an html parser (may) use? I mean, does it parse the whole source with a regex or it uses a normal programming language such as c# or python?

Apart from the question above can you also brief me on from where I should start to create my own parser? (I'd like to create an html parser for my personal needs :)


Solution

  • Python, Java, and Perl are all fine languages for learning to write an HTML parser. Perl is very pleasant for regular expressions, but that's not what you need for a parser. It is a bit more pleasant to write OO programs in Python or Java. C/C++/C#, etc., are also common, for very fast parsers. However, as a learning exercise, I recommend Python or Java, so that you can compare your work with standard parsers.