I need to convert HTML data that consists of <h2>..</h2>
, <p>..</p>
and <a href=".."><img ..></a>
elements into the attributedString with a proper formatting. I want to assign <h2>
to UIFontTextStyleHeadline1
and <p>
to UIFontTextStyleBody
and store image links. I need the output to be attributedString with heading and body elements only and I will handle the images separately.
So far, I have this code:
NSMutableAttributedString *content = [[NSMutableAttributedString alloc]
initWithData:[[post objectForKey:@"content"]
dataUsingEncoding:NSUTF8StringEncoding]
options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]}
documentAttributes:nil error:nil];
which outputs to something like this:
Heading
{
NSColor = "UIDeviceRGBColorSpace 0 0 0 1";
NSFont = "<UICTFont: 0xd47bc00> font-family: \"TimesNewRomanPS-BoldMT\"; font-weight: bold; font-style: normal; font-size: 18.00pt";
NSKern = 0;
NSParagraphStyle = "Alignment 4, LineSpacing 0, ParagraphSpacing 14.94, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n), DefaultTabInterval 36, Blocks (null), Lists (null), BaseWritingDirection 0, HyphenationFactor 0, TighteningFactor 0, HeaderLevel 2";
NSStrokeColor = "UIDeviceRGBColorSpace 0 0 0 1";
NSStrokeWidth = 0;
}{
NSAttachment = "<NSTextAttachment: 0xd486590>";
NSColor = "UIDeviceRGBColorSpace 0 0 0.933333 1";
NSFont = "<UICTFont: 0xd47cdb0> font-family: \"Times New Roman\"; font-weight: normal; font-style: normal; font-size: 12.00pt";
NSKern = 0;
NSLink = "http://www.placeholder.com/image.jpg";
NSParagraphStyle = "Alignment 4, LineSpacing 0, ParagraphSpacing 12, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n), DefaultTabInterval 36, Blocks (null), Lists (null), BaseWritingDirection 0, HyphenationFactor 0, TighteningFactor 0, HeaderLevel 0";
NSStrokeColor = "UIDeviceRGBColorSpace 0 0 0.933333 1";
NSStrokeWidth = 0;
}
Body text, body text, body text. Body text, body text, body text.
{
NSColor = "UIDeviceRGBColorSpace 0 0 0 1";
NSFont = "<UICTFont: 0xd47cdb0> font-family: \"Times New Roman\"; font-weight: normal; font-style: normal; font-size: 12.00pt";
NSKern = 0;
NSParagraphStyle = "Alignment 4, LineSpacing 0, ParagraphSpacing 12, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 0/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n), DefaultTabInterval 36, Blocks (null), Lists (null), BaseWritingDirection 0, HyphenationFactor 0, TighteningFactor 0, HeaderLevel 0";
NSStrokeColor = "UIDeviceRGBColorSpace 0 0 0 1";
NSStrokeWidth = 0;
}
I am new to attributedString and seek for an efficient way to convert these attributes into the standard fonts mentioned above. Thank you.
If somebody would seek something similar I am on the end using TFHpple librabry to separate images from text elements in HTML data and then I change format attributes of the attributedString like this:
NSString *contentString = [self parseHTMLdata:bodyString];
NSMutableAttributedString *content = [[NSMutableAttributedString alloc] initWithData:[contentString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil];
// prepare new format
NSRange effectiveRange = NSMakeRange(0, 0);
NSDictionary *attributes;
while (NSMaxRange(effectiveRange) < [content length]) {
attributes = [content attributesAtIndex:NSMaxRange(effectiveRange) effectiveRange:&effectiveRange];
UIFont *font = [attributes objectForKey:@"NSFont"];
if (font.pointSize == 18.0f) {
[content addAttribute:NSFontAttributeName value:self.headlineFont range:effectiveRange];
} else {
[content addAttribute:NSFontAttributeName value:self.bodyFont range:effectiveRange];
}
}
And the hpple part:
- (NSString *)parseHTMLdata:(NSString *)content
{
NSData *data = [content dataUsingEncoding:NSUTF8StringEncoding];
TFHpple *parser = [[TFHpple alloc] initWithHTMLData:data];
NSString *xpathQueryString = @"//body";
NSArray *elements = [[[parser searchWithXPathQuery:xpathQueryString] firstObject] children];
NSMutableString *textContent = [[NSMutableString alloc] init];
for (TFHppleElement *element in elements) {
if ([[element tagName] isEqualToString:@"h2"] || [[element tagName] isEqualToString:@"p"]) {
if ([[[element firstChild] tagName] isEqualToString:@"a"]) {
// image element, just save it in array
} else {
// pure h2 or p element
[textContent appendString:[element raw]];
}
}
}
return textContent;
}
Checking the font size in attributes may seem fragile, if it would cause some problems I can dig deeper into paragraph style which holds the heading/body tags.