-[NSMutableAttributedString initWithHTML:documentAttributes:]
seems to mangle special characters:
NSString *html = @"“Hello” World"; // notice the smart quotes
NSData *htmlData = [html dataUsingEncoding:NSUTF8StringEncoding];
NSMutableAttributedString *as = [[NSMutableAttributedString alloc] initWithHTML:htmlData documentAttributes:nil];
NSLog(@"%@", as);
That prints “Hello†World
followed by some RTF commands. In my application, I convert the attributed string to RTF and display it in an NSTextView
, but the characters are corrupted there, too.
According to the documentation, the default encoding is UTF-8, but I tried being explicit and the result is the same:
NSDictionary *attributes = @{NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]};
NSMutableAttributedString *as = [[NSMutableAttributedString alloc] initWithHTML:htmlData documentAttributes:&attributes];
Use [html dataUsingEncoding:NSUnicodeStringEncoding]
when creating the NSData and set the matching encoding option when you parse the HTML into an attributed string:
The documentation for NSCharacterEncodingDocumentAttribute
is slightly confusing:
NSNumber, containing an int specifying the
NSStringEncoding
for the file; for reading and writing plain text files and writing HTML; default for plain text is the default encoding; default for HTML is UTF-8.
So, you code should be:
NSString *html = @"“Hello” World";
NSData *htmlData = [html dataUsingEncoding:NSUTF8StringEncoding];
NSDictionary *options = @{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: @(NSUTF8StringEncoding)};
NSMutableAttributedString *as =
[[NSMutableAttributedString alloc] initWithHTML:htmlData
options: options
documentAttributes:nil];