Search code examples
objective-cregexparsingbookmarksnsregularexpression

Import Bookmarks from html file


Im trying to add a import bookmarks function to my app. I have some of it but it will just extract all URLs and titles.

- (NSArray *)urlsInHTML:(NSString *)html {
    NSError *error;
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(?<=href=\").*?(?=\")" options:NSRegularExpressionCaseInsensitive error:&error];

    NSArray *arrayOfAllMatches = [regex matchesInString:html options:0 range:NSMakeRange(0, [html length])];

    NSMutableArray *arrayOfURLs = [[NSMutableArray alloc] init];

    for (NSTextCheckingResult *match in arrayOfAllMatches) {
        NSString* substringForMatch = [html substringWithRange:match.range];
        NSLog(@"Extracted URL: %@",substringForMatch);

        [arrayOfURLs addObject:substringForMatch];
    }

    // return non-mutable version of the array
    return [NSArray arrayWithArray:arrayOfURLs];
}

- (NSArray *)titlesOfTagsInHTML:(NSString *)html {
    NSError *error;
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(?<=\"\\>)(.*?)(?=\\<\\/)" options:NSRegularExpressionCaseInsensitive error:&error];

    NSArray *arrayOfAllMatches = [regex matchesInString:html options:0 range:NSMakeRange(0, [html length])];

    NSMutableArray *arrayOfURLs = [[NSMutableArray alloc] init];

    for (NSTextCheckingResult *match in arrayOfAllMatches) {
        NSString* substringForMatch = [html substringWithRange:match.range];
        NSLog(@"Extracted Title: %@",substringForMatch);

        [arrayOfURLs addObject:substringForMatch];
    }

    // return non-mutable version of the array
    return [NSArray arrayWithArray:arrayOfURLs];
}

- (IBAction)import {

    ProgressAlertView *progressAlert = [[ProgressAlertView alloc] initWithTitle:@"Crux" message:@"Importing Bookmarks..." delegate:self cancelButtonTitle:nil otherButtonTitles:nil];
    [progressAlert show];

    NSString *htmlString = [NSString stringWithContentsOfFile:importingBookmarkFilePath encoding:NSUTF8StringEncoding error:nil];
    NSArray *urls = [self urlsInHTML:htmlString];
    NSArray *titles = [self titlesOfTagsInHTML:htmlString];
    //float progress = [[NSNumber numberWithInt:i] floatValue]/[[NSNumber numberWithInteger:[urls count]-1] floatValue];
    for (int i=0; i<[urls count]; i++) {
        Bookmark *importedBookmark = [[Bookmark alloc] init];
        importedBookmark.url = urls[i];
        importedBookmark.title = titles[i];
        [[[BookmarkManager sharedInstance] bookmarks] addObject:importedBookmark];
        [[BookmarkManager sharedInstance] saveBookmarks];
    }
}

But I cant find how to determine folders so i can keep theme exactly the way they were in the other browser. To see how safari exports them just go to file>export bookmarks and you can see the html file. It puts everything in a definition list with the folder titles. Using NSREgularExpression or other ways, how can i get each folder title, and everything in that folder?

I have tried using NSXMLParser to parse the html, but it stops at the first definition list tag and fails.


Solution

  • The format is not that complicated, so you should be able to parse it using NSScanner. The general flow will go like this:

    • Scan up to <DT>
    • Check to see if the following is H3 or A (Folder or Bookmark)
    • Process accordingly
    • Repeat

    Folders can have subfolders, so you will need to create the object recursively. Good luck.