Search code examples
htmliosobjective-cxpathhpple

TFHpple - xpath in iOS - get two parents up


looking to get two parents up (great grand parents) or two children down?

<table style="background-color: #008000; border-style: none;" border="0" cellpadding="2" cellspacing="2">
  <tr>
    <td>
      <img height="5" width="5" border="0" src="https://spacer.gif" title="07:00,24hrs: B Shift /.../E704/RS704/Firefighter #2" alt="07:00,24hrs: B Shift /.../E704/RS704/Firefighter #2">
    </td>
  </tr>
</table>
<img height="2" width="2" border="0" src="https://spacer.gif" alt="">
  <img alt="" height="1" width="1" border="0" src="https://spacer.gif">
    </td>
    <TD ALIGN="RIGHT" VALIGN="TOP" width="17%">
      <a href="javascript:void(0)" title="01/15/2013" class="daylink" onClick="return DayClick('01/15/2013');">15</a>
    </TD>
    <td rowspan="2" width="5">
      <img alt="" height="1" width="1" border="0" src="https://spacer.gif">
    </td>
    </tr>
    <tr>
      <TD COLSPAN="2">
        <TABLE>
          <TR>
            <TD style="background-color: #41FFB9; " class="calexception">
              <a href="javascript:void(0)" onClick="return ShowRemoveExceptionWindow(&quot;4A30E80.fre01&quot;,&quot;3280530&quot;);" title="10hrs DetNonEMSStud(10),  07:00 - 17:00" style="color: #000000; text-decoration: none; font-weight: bold;">DetNonEMSStud(10)</a>
            </TD>
          </TR>
        </TABLE>

iOS:

NSString *tutorialsXpathQueryString = @"//table/tr/td";
NSArray *tutorialsNodes = [tutorialsParser searchWithXPathQuery:tutorialsXpathQueryString];
NSLog(@"here is url: %@", tutorialsNodes);

NSMutableArray *newTutorials = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in tutorialsNodes) {

    Tutorial *tutorial = [[Tutorial alloc] init];
    [newTutorials addObject:tutorial];

    tutorial.url = [element objectForKey:@"style"];
    tutorial.title = [[element firstChild] objectForKey:@"title"];

I'm getting style from <td> and title from <a> just fine I also need to get the style from <table>

I'm very new to obj-c and first attempt with XPATH, any sample would be great!


Solution

  • I have not used TFHpple, but if it supports standard XPath, you should be able to go from a td to its containing table with this XPath:

    ancestor::table[1]
    

    This XPath selects the nearest table that is an ancestor of the context node, so if you had this markup:

    <table>
        <tr><td></td></tr>
        <tr>
           <td>
             <table><tr><td></td></tr></table>
             <table>
                 <tr>
                     <td>Hey!</td>
                 </tr>
             </table>
           </td>
        </tr>      
    </table>
    

    And your context node was the td with the text "Hey!", then the above XPath would select the table on line 6.

    It looks like TFHpple doesn't provide a way to evaluate XPath on a context node. Given that, a new suggestion - every element only has one parent, so if you keep going up through the parents, you should eventually find the table. It's a lot harder to go downward because an element could have any number of direct children, each with their own sets of children. I don't really know Objective-C, but if you can be sure that the table is only two levels up, then something like this would probably work:

    TFHppleElement *table = [[element parent] parent];
    

    If there's no guarantee that the table is two levels up, then there should be some way to find the table by going up through the parents. This is pseudocode, but hopefully you get the idea:

    for (TFHppleElement *element in tutorialsNodes) {
    
        Tutorial *tutorial = [[Tutorial alloc] init];
        [newTutorials addObject:tutorial];
    
        tutorial.url = [element objectForKey:@"style"];
        tutorial.title = [[element firstChild] objectForKey:@"title"];
    
        TFHppleElement *table = [element parent];
        while(table != null && [table tagName] != "table") {
            table = [table parent]
        }
    
        // table should either be the parent table at this point, 
        //  or null if there was no parent table.