So I'm taking a data file and encoding it into a string:
///////////////////////////////
// Get the string
NSString* dataString = [[NSString alloc] initWithData:data
encoding:encoding];
NSLog(@"dataString = %@",dataString);
The file was a list of French words and they NSLog
fine, showing appropriate accents (just one example):
abandonnèrent
Now, in the very next part of the code I take this NSString
of the file contents and convert it to a dictionary where the words are the keys and the objects are two additional dictionaries:
///////////////////////////////
// Now parse the file (string)
NSMutableDictionary *mutableWordlist = [[NSMutableDictionary alloc] init];
int i = 0;
for (NSString *line in [dataString componentsSeparatedByString:@"\n"]) {
NSArray *words = [line componentsSeparatedByString:@"\t"];
NSNumber *count = [NSNumber numberWithInt:(i+1)];
NSArray *keyArray;
NSArray *objectArray;
if ([words count] < 2) { // No native word
keyArray = [[NSArray alloc] initWithObjects:@"frequency", nil];
objectArray = [[NSArray alloc] initWithObjects:count, nil];
}
else {
keyArray = [[NSArray alloc] initWithObjects:@"frequency", @"native", nil];
objectArray = [[NSArray alloc] initWithObjects:count, [words[1] lowercaseString], nil];
}
NSDictionary *detailsDict = [[NSDictionary alloc] initWithObjects:objectArray forKeys:keyArray];
[mutableWordlist setObject:detailsDict forKey:[words[0] lowercaseString]];
i++;
}
self.wordlist = mutableWordlist;
NSLog(@"self.wordlist = %@", self.wordlist);
But here the keys have encoding issues and log as so if they have an accent:
"abandonn\U00e8rent
" = {
frequency = 24220;
};
What is happening?
Nothing (wrong) is happening.
When you NSLog
an NSString
it is being output as Unicode text. However when you NSLog
the NSDictionary
they keys are being output with unicode escape sequences, \U00e8
is the escape code you can use in a string if you cannot type an è
- say because your source file is in ASCII.
So the difference is only in how the string is being printed, the string is not different.
HTH