I'm trying to write a very simple iOS app that will parse a webpage (http://arxiv.org/list/cond-mat/recent) and display a simplified version of it. I chose to use TFHpple to parse this page. I want to get titles of papers and display them in the TableViewController. The HTML container for paper descriptions looks like:
<div class="list-title">
<span class="descriptor">Title:</span> Encoding Complexity within Supramolecular Analogues of Frustrated Magnets
</div>
Function that I use to parse and get the values is the following (thanks to raywenderlich.com):
- (void) loadPapers{
NSURL *papersURL = [NSURL URLWithString:@"http://www.arxiv.org/list/cond-mat/recent"];
NSData *papersHTMLData = [NSData dataWithContentsOfURL:papersURL];
TFHpple *papersParser = [TFHpple hppleWithHTMLData:papersHTMLData];
NSString *papersXpathQueryString = @"//div[@class='list-title']";
NSArray *papersNodes = [papersParser searchWithXPathQuery:papersXpathQueryString];
NSMutableArray *newPapers = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in papersNodes){
Paper *paper = [[Paper alloc] init];
[newPapers addObject:paper];
paper.title = [[element firstChild] content];
}
_objects = newPapers;
[self.tableView reloadData];
}
This function is supposed to parse the entire HTML page and return data into TableView. However, when I try it returns empty objects into the paperNodes array. Basically, the number of the elements is correct (~25), but they're all empty and I am not sure why.
Any help is greatly appreciated! Thanks!
I have rewritten your code with HTMLKit. It looks like this:
NSURL *papersURL = [NSURL URLWithString:@"http://www.arxiv.org/list/cond-mat/recent"];
NSData *papersHTMLData = [NSData dataWithContentsOfURL:papersURL];
NSString *htmlString = [[NSString alloc] initWithData:papersHTMLData encoding:NSUTF8StringEncoding];
HTMLDocument *document = [HTMLDocument documentWithString:htmlString];
NSArray *divs = [document querySelectorAll:@"div[class='list-title']"];
for (HTMLElement *element in divs) {
NSLog(@"%@", element.textContent);
}
Back to your question in the comment:
Could you give some useful links that you find good to learn about HTMLKit?
You can check out the examples on the project's GitHub page. The source code is documented and using it is relatively straightforward. If you have basic HTML & CSS experience then using HTMLKit would be just as easy. Unfortunately there are no other resources it to learn it yet.