Search code examples
iosparsingtfhpple

iOS html/xml parsing google shopping results with TFHpple


is there any way to parsing google shopping results using TFHpple without using google API (deprecated) but simple using url like for example this: https://www.google.com/search?hl=en&tbm=shop&q=AudiR8 ?

I've tried many types of tags:

...
myCar = @"Audi R8";
myURL = [NSString stringWithFormat:@"https://www.google.com/search?hl=en&tbm=shop&q=%@",myCar];
NSData *htmlData = [[NSData alloc] initWithContentsOfURL:[NSURL URLWithString:myURL]];
TFHpple *xpath = [[TFHpple alloc] initWithHTMLData:htmlData];
//use xpath to search element
NSArray *elements = [NSArray new];
elements = [xpath searchWithXPathQuery:@"//html//body"]; // <-- tags
...

but nothing to do, always the same output console message: UNABLE TO PARSE.


Solution

  • I've found various problem and finally i've solved all. First of all it's necessary to encoding URL adding:

    myURL = [myURL stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
    

    Then, inside original (and actual) TFHPPLE code (for exactly XPathQuery.m) parsing phase going to crash 'cause any time nodeContent and Raw are NIL. So, to solve this crash I've changed

    [resultForNode setObject:currentNodeContent forKey:@"nodeContent"];
    

    with (ATTENTION FOR BOTH ROWS [resultForNode...:

    if (currentNodeContent != nil)
       [resultForNode setObject:currentNodeContent forKey:@"nodeContent"];
    

    and:

    [resultForNode setObject:rawContent forKey:@"raw"];
    

    with:

    if (rawContent != nil)
          [resultForNode setObject:rawContent forKey:@"raw"];
    

    I want to remember that, 'cause the harder html code used by google, i decide to use these xpathqueries:

    ...
            NSArray *elementsImages = [NSArray new];
            NSArray *elementsPrices = [NSArray new];
            elementsImages = [xpath searchWithXPathQuery:@"//html//*[@class=\"psliimg\"]"];
            elementsPrices = [xpath searchWithXPathQuery:@"//html//*[@class=\"psliprice\"]"];
    ...
    

    Another inconvenience is when you decide to use a for or while cycle to retrieve various html pages, in fact if you use:

    NSData *htmlData = [[NSData alloc] initWithContentsOfURL:[NSURL URLWithString:myURL]];
    

    initWithContenctsOfURL many times during the cycle cannot get correctly page (and debug console write the famous UNABLE TO PARSE )so I've decide to change it with:

    // Send a synchronous request
    NSURLRequest * urlRequest = [NSURLRequest requestWithURL:[NSURL URLWithString:myURL]];
    NSURLResponse * response = nil;
    NSError * error = nil;
    NSData * data = [NSURLConnection sendSynchronousRequest:urlRequest
                                              returningResponse:&response
                                                          error:&error];
    
    if (error == nil)
    {
        // Parse data here
    }
    

    And if you don't want to waiting this cycle 'cause it's maded by syncronous NSURLRequests try to call parent method with (and your viewcontroller don't freeze waiting for parser):

    _dispatch_queue_t *queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
                        dispatch_async( _queue, // now i call my google shopping parser cycle
                        ^{
                            [self GShoppingParser];
    });