Search code examples

Convert HTML to properly formatted attributedString

I need to convert HTML data that consists of <h2>..</h2>, <p>..</p> and <a href=".."><img ..></a> elements into the attributedString with a proper formatting. I want to assign <h2> to UIFontTextStyleHeadline1 and <p> to UIFontTextStyleBody and store image links. I need the output to be attributedString with heading and body elements only and I will handle the images separately.

So far, I have this code:

NSMutableAttributedString *content = [[NSMutableAttributedString alloc] 
         initWithData:[[post objectForKey:@"content"] 
              options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
                   NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]}
   documentAttributes:nil error:nil];

which outputs to something like this:

    NSColor = "UIDeviceRGBColorSpace 0 0 0 1";
    NSFont = "<UICTFont: 0xd47bc00> font-family: \"TimesNewRomanPS-BoldMT\"; font-weight: bold; font-style: normal; font-size: 18.00pt";
    NSKern = 0;
    NSParagraphStyle = "Alignment 4, LineSpacing 0, ParagraphSpacing 14.94, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n), DefaultTabInterval 36, Blocks (null), Lists (null), BaseWritingDirection 0, HyphenationFactor 0, TighteningFactor 0, HeaderLevel 2";
    NSStrokeColor = "UIDeviceRGBColorSpace 0 0 0 1";
    NSStrokeWidth = 0;
    NSAttachment = "<NSTextAttachment: 0xd486590>";
    NSColor = "UIDeviceRGBColorSpace 0 0 0.933333 1";
    NSFont = "<UICTFont: 0xd47cdb0> font-family: \"Times New Roman\"; font-weight: normal; font-style: normal; font-size: 12.00pt";
    NSKern = 0;
    NSLink = "";
    NSParagraphStyle = "Alignment 4, LineSpacing 0, ParagraphSpacing 12, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n), DefaultTabInterval 36, Blocks (null), Lists (null), BaseWritingDirection 0, HyphenationFactor 0, TighteningFactor 0, HeaderLevel 0";
    NSStrokeColor = "UIDeviceRGBColorSpace 0 0 0.933333 1";
    NSStrokeWidth = 0;
Body text, body text, body text. Body text, body text, body text.
    NSColor = "UIDeviceRGBColorSpace 0 0 0 1";
    NSFont = "<UICTFont: 0xd47cdb0> font-family: \"Times New Roman\"; font-weight: normal; font-style: normal; font-size: 12.00pt";
    NSKern = 0;
    NSParagraphStyle = "Alignment 4, LineSpacing 0, ParagraphSpacing 12, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n), DefaultTabInterval 36, Blocks (null), Lists (null), BaseWritingDirection 0, HyphenationFactor 0, TighteningFactor 0, HeaderLevel 0";
    NSStrokeColor = "UIDeviceRGBColorSpace 0 0 0 1";
    NSStrokeWidth = 0;

I am new to attributedString and seek for an efficient way to convert these attributes into the standard fonts mentioned above. Thank you.


  • If somebody would seek something similar I am on the end using TFHpple librabry to separate images from text elements in HTML data and then I change format attributes of the attributedString like this:

    NSString *contentString = [self parseHTMLdata:bodyString];
    NSMutableAttributedString *content = [[NSMutableAttributedString alloc] initWithData:[contentString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil];
    // prepare new format
    NSRange effectiveRange = NSMakeRange(0, 0);
    NSDictionary *attributes;
    while (NSMaxRange(effectiveRange) < [content length]) {
    attributes = [content attributesAtIndex:NSMaxRange(effectiveRange) effectiveRange:&effectiveRange];
        UIFont *font = [attributes objectForKey:@"NSFont"];
        if (font.pointSize == 18.0f) {
            [content addAttribute:NSFontAttributeName value:self.headlineFont range:effectiveRange];
        } else {
            [content addAttribute:NSFontAttributeName value:self.bodyFont range:effectiveRange];

    And the hpple part:

    - (NSString *)parseHTMLdata:(NSString *)content
        NSData *data = [content dataUsingEncoding:NSUTF8StringEncoding];
        TFHpple *parser = [[TFHpple alloc] initWithHTMLData:data];
        NSString *xpathQueryString = @"//body";
        NSArray *elements = [[[parser searchWithXPathQuery:xpathQueryString] firstObject] children];
        NSMutableString *textContent = [[NSMutableString alloc] init];
        for (TFHppleElement *element in elements) {
            if ([[element tagName] isEqualToString:@"h2"] || [[element tagName] isEqualToString:@"p"]) {
                if ([[[element firstChild] tagName] isEqualToString:@"a"]) {
                    // image element, just save it in array
                } else {
                    // pure h2 or p element
                    [textContent appendString:[element raw]];
        return textContent;

    Checking the font size in attributes may seem fragile, if it would cause some problems I can dig deeper into paragraph style which holds the heading/body tags.