Let me start off by saying that I am not particularly trying to find a solution, just the root cause of the problem. I am trying to retrieve a JSON from a url. In browser, the url call works just fine and I am able to see the entire JSON without issue. However, in x-code when simply using NSURLConnection, I am getting data bytes, but my NSString is null.
theString = [[NSString alloc] initWithData:urlData encoding:NSUTF8StringEncoding];
After doing some research I have found that I am probably trying to use the wrong encoding. I am not sure what type of encoding is being used by the url, so on first instinct I just tried some random encoding types.
NSString* myString = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];
NSString* myString2 = [[NSString alloc] initWithData:data encoding:NSUTF16StringEncoding];
NSString* myString3 = [[NSString alloc] initWithData:data encoding:NSWindowsCP1252StringEncoding];
NSASCIIStringEncoding and NSWindowsCP1252StringEncoding is able to bring back a partially correct JSON. It is not the entire JSON thatI am able to view in the browser, and some characters are a little messed up, but it is something. To try and better determine what encoding was used, I decided to use the following method to try and determine it by looking at what encoding returned.
NSError *error = nil;
NSStringEncoding encoding;
NSString *my_string = [[NSString alloc] initWithContentsOfURL:url
usedEncoding:&encoding
error:&error];
My NSStringEncoding value is 3221214344. And this number is consistent everytime I run the app. I can not find any NSStringEncoding values that even come close to matching this.
My final question is: Is the encoding used for this url not consumable by iOS, is it possible that multiple types of encoding was used for this url, or is there something else that I could be doing wrong on my end?
It's best not to rely on Cocoa to figure out the string encoding if possible, especially if the data might be corrupted. A better approach would be to check if the value indicated by the HTTP Content-Type header specifies a character set like in this example:
Content-Type: text/html; charset=ISO-8859-4
Once you're able to parse and retrieve a character set name from the Content-Type header, you need to convert it to an NSStringEncoding
, first by passing it to CFStringConvertIANACharSetNameToEncoding
, and then passing the returned CF string encoding to CFStringConvertEncodingToNSStringEncoding
. After that, you can initialize your string using -[NSString initWithData:encoding:]
.
NSData *HTTPResponseBody = …; // Get the HTTP response body
NSString *charSetName = …; // Get a charset name from the Content-Type HTTP header
// Get the Core Foundation string encoding
CFStringEncoding cfencoding = CFStringConvertIANACharSetNameToEncoding((CFStringRef)charSetName);
// Confirm this is a known encoding
if (cfencoding != kCFStringEncodingInvalidId) {
// Initialize the string
NSStringEncoding nsencoding = CFStringConvertEncodingToNSStringEncoding(cfencoding);
NSString *JSON = [[NSString alloc] initWithData: HTTPResponseBody
encoding: nsencoding];
}
You still may run into problems if the string data you're working with is corrupted. For example, in the above code snippet, perhaps charSetName
is UTF-8, but HTTPResponseBody
can't be parsed as UTF-8 because there's an invalid byte sequence. In this situation, Cocoa will return nil
when you try to instantiate your string, and short of sanitizing the data so that it conforms to the reported string encoding (perhaps by stripping out invalid byte sequences), you may want to report an error back to the end user.
As a last-ditch effort — rather than reporting an error — you could initialize a string using an encoding that can handle anything you throw at it, such as NSMacOSRomanStringEncoding
. The one caveat here is that unicode / corrupted data may show up intermittently as symbols or unexpected alphanumerics.