I am trying to pull data from a website using objective-c. This is all very new to me, so I've done some research. What I know now is that I need to use xpath, and I have another wrapper for that called hpple for the iPhone. I've got it up and running in my project.
I am confused about the way I retrieve information from the site. Apparently I am to use regular expressions in this line of code:
NSArray * a = [doc search:@"//a[@class='sponsor']"];
This is just an example. Is that stuff in the search:@"...." the regular expression? If so, I guess I can develop the hundreds of patterns that I will need for my program to parse the site (I need a lot of data), but is there a better way? I'm very lost in this. Any help is appreciated.
The parameter is an XPath, not a regular expression. Here's a breakdown:
//
is an abbreviation meaning "all descendents"a
means "all child nodes with a node type of 'a'" (in HTML, that's anchors)[...]
contains a predicate, refining just which a
to match
@
is an abbreviation for attribute nodes@class
means an attribute named "class"@class='sponsor'
means a class attribute equal to "sponsor". Note this will not match nodes with a class containing "sponsor", such as <a class="big sponsor" ...>
; the class must be equal.All together, we have "'a' nodes descending from the root that have class equal to 'sponsor'".