Search code examples
iosobjective-cxcodeios5ios6

Get a link and removing all the clutter and only focusing on the content - what's the best way to do this in Objective-C?


I want to take a link (and grab its HTML) and only keep the part that is important, say the article. There are many HTML parsing libraries for Objective-C - hpple, for example - but I want to do more than just parse specific things, I need something that removes all the things that aren't part of the readable content. Kinda like what Instapaper, Readability, Pocket or Safari's Reader feature do.

What would be the best way to accomplish this in Objective-C/iOS?


Solution

  • I'm not sure if there's a way in Objective-C, but Readability had an open source Javascript implementation that got at the contents of web pages. See also this answer and the linked code (called "boilerplate") which may help you. It seems to be in Java though.

    For just getting links, use NSDataDetector to scan the text.