I'd like to know if calling stringEncodingForData:encodingOptions:convertedString:usedLossyConversion:
can return NSUTF16StringEncoding
, NSUTF32StringEncoding
or any of their variants?
The reason I'm asking is because of this documentation note on cStringUsingEncoding:
:
Special Considerations
UTF-16 and UTF-32 are not considered to be C string encodings, and should not be used with this method—the results of passing NSUTF16StringEncoding, NSUTF32StringEncoding, or any of their variants are undefined.
So I understand that creating a C string with UTF-16 or UTF-32 is unsupported, but I'm not sure if attempting String Encoding Detection with stringEncodingForData:encodingOptions:convertedString:usedLossyConversion:
may return UTF-16 and UTF-32 or not.
An example scenario, (adapted from SSZipArchive.m), may be:
// name is a null-terminated C string built with `fread` from stdio.h:
char *name = (char *)malloc(size_name + 1);
size_t read = fread(name, 1, size_name + 1, file);
name[size_name] = '\0';
// dataName is the data object of name
NSData *dataName = [NSData dataWithBytes:(const void *)name length:sizeof(unsigned char) * size_name];
// stringName is the string object of dataName
NSString *stringName = nil;
NSStringEncoding encoding = [NSString stringEncodingForData:dataName encodingOptions:nil convertedString:&stringName usedLossyConversion:nil];
In the above code, can encoding
be NSUTF16StringEncoding
, NSUTF32StringEncoding
or any of their variants?
Platforms: macOS 10.10+, iOS 8.0+, watchOS 2.0+, tvOS 9.0+.
Yes, if the string is encoded using one of those encodings. The notes about C strings are specific to C strings. An NSString is not a C string, and the method you're describing doesn't work on C strings; it works on arbitrary data that may be encoded in a wide variety of ways.
As an example:
#import <Foundation/Foundation.h>
int main(int argc, const char * argv[]) {
@autoreleasepool {
NSData *data = [@"test" dataUsingEncoding:NSUTF16StringEncoding];
NSStringEncoding encoding = [NSString stringEncodingForData:data
encodingOptions:nil
convertedString:nil
usedLossyConversion:nil];
NSLog(@"%ld == %ld", (unsigned long)encoding,
(unsigned long)NSUTF16StringEncoding);
}
return 0;
}
// Output: 10 == 10
This said, in your specific example, if name
is really what it says it is, "a null-terminated C string," then it could never be UTF-16, because C strings cannot be encoded in UTF-16. C strings are \0 terminated, and \0 is a very common character in UTF-16. Without seeing more code, however, I would not gamble on whether that comment is accurate.
If your real question here is "given an arbitrary c-string-safe encoding, is it possible that stringEncodingForData:
will return a not-c-string-safe encoding," then the answer is "yes, it could, and it's definitely not promised that it won't even if it doesn't today." If you need to prevent that, I recommend using NSStringEncodingDetectionSuggestedEncodingsKey
and ...UseOnlySuggestedEncodingsKey
to force it to be an encoding you can handle. (You could also use ...DisallowedEncodingsKey
to prevent specific multi-byte encodings, but that wouldn't be as robust.)