Search code examples
cocoansattributedstring

NSAttributedString initWithHTML incorrect character encoding?


-[NSMutableAttributedString initWithHTML:documentAttributes:] seems to mangle special characters:

NSString *html = @"“Hello” World"; // notice the smart quotes
NSData *htmlData = [html dataUsingEncoding:NSUTF8StringEncoding];
NSMutableAttributedString *as = [[NSMutableAttributedString alloc] initWithHTML:htmlData documentAttributes:nil];
NSLog(@"%@", as);

That prints “Hello†World followed by some RTF commands. In my application, I convert the attributed string to RTF and display it in an NSTextView, but the characters are corrupted there, too.

According to the documentation, the default encoding is UTF-8, but I tried being explicit and the result is the same:

NSDictionary *attributes = @{NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]};
NSMutableAttributedString *as = [[NSMutableAttributedString alloc] initWithHTML:htmlData documentAttributes:&attributes];

Solution

  • Use [html dataUsingEncoding:NSUnicodeStringEncoding] when creating the NSData and set the matching encoding option when you parse the HTML into an attributed string:

    The documentation for NSCharacterEncodingDocumentAttribute is slightly confusing:

    NSNumber, containing an int specifying the NSStringEncoding for the file; for reading and writing plain text files and writing HTML; default for plain text is the default encoding; default for HTML is UTF-8.

    So, you code should be:

    NSString *html = @"“Hello” World";
    NSData *htmlData = [html dataUsingEncoding:NSUTF8StringEncoding];
    NSDictionary *options = @{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
                                        NSCharacterEncodingDocumentAttribute: @(NSUTF8StringEncoding)};
    NSMutableAttributedString *as =
        [[NSMutableAttributedString alloc] initWithHTML:htmlData
                                                options: options
                                     documentAttributes:nil];