Search code examples
objective-cparsingnsstringnsxmlparsernsurl

Understanding urls correctly


I'm writing RSS reader and taking article urls from feeds, but often have invalid urls while parsing with NSXMLParser. Sometimes have extra symbols at the end of url(for example \n,\t). This issue I fixed. Most difficult trouble is urls with queries that have characters not allowed to be url-encoded. Working url for URL-request http://www.bbc.co.uk/news/education-23809095#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa '#' character will replaced to "%23" by "stringByAddingPercentEscapesUsingEncoding:" method and will not work. Site will say what page not found. I believe after '#' character is a query string. Are there a way to get(encode) any url from feeds correctly, at least always removing a query strings from xml?


Solution

  • There two approaches you could use to create a legal URL string by either using stringByAddingPercentEncodingWithAllowedCharacters or by using CFURL core foundation class which gives you a whole range of options.

    Example 1 (NSCharacterSet):

    NSString *nonFormattedURL = @"http://www.bbc.co.uk/news/education-23809095#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa";
    
    NSLog(@"%@", [nonFormattedURL stringByAddingPercentEncodingWithAllowedCharacters:[[NSCharacterSet illegalCharacterSet] invertedSet]]);
    

    This still keep the hash tag in place by inverting the illegalCharacterSet in NSCharacterSet object. If you like more control you also create your own mutable set.

    Example 2 (CFURL.h):

    NSString *nonFormattedURL = @"http://www.bbc.co.uk/news/education-23809095#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa";
    CFAllocatorRef allocator = CFAllocatorGetDefault();
    CFStringRef formattedURL = CFURLCreateStringByAddingPercentEscapes(allocator,
                                                                           (__bridge CFStringRef) nonFormattedURL,
                                                                           (__bridge CFStringRef) @"#", //leave unescaped
                                                                           (__bridge CFStringRef) @"", // legal characters to be escaped like / = # ? etc
                                                                           NSUTF8StringEncoding); // encoding
    
    
    
    NSLog(@"%@",  formattedURL);
    

    Does the same as above code but with way more control: replacing certain characters with the equivalent percent escape sequence based on the encoding specified, see logs for example.