Given a URL of a webpage, I need to get the HTML between an opening <div>
and a closing </div>
of a particular class.
I think if I can return the whole html code of the page as a string I could use RegEx to extract between the HTML between the certain <div>
class and return it as a string.
How could we achieve this using Objective-C and RegExes?
For the parsing part, I have 3 words for you:
Don't try it
Read Parsing HTML the Cthulhu Wya (by Jeff himself) and see this ever-famous SO answer. For libraries, use HTML::Sanitizer
On the other hand, most programs will neither need to, nor should, anticipate the entire universe of HTML when parsing. In fact, designing a program to do so may well be a completely wrong-headed approach, if it changes a program from a few-line script to a bullet-proof commercial-grade program which takes orders of magnitude more time to properly code and support. Resource expenditure should always (oops, make that very frequently, I about overgeneralized, too) be considered when creating a programmatic solution. In addition, hard boundaries need not always be an HTML-oriented limitation. They can be as simple as "work with these sets of web pages", "work with this data from these web pages", "work for 98% users 98% of the time", or even "OMG, we have to make this work in the next hour, do the best you can".
So if you're parsing something like icanhazip, you can opt for it. Maybe if it's small, it might work. Or if you're using static content. That's for you to choose. Good luck!