Search code examples
objective-cmacosutf8-decode

Scandinavian characters æ, ø, å escaped incorrectly


My program interfaces with servers in other countries and regularly needs to handle URLs containing foreign characters. This works fine until we consider Scandinavian characters such as æ, ø, and å. When I receive a URL, I decode it as follows:

-(NSString*)urlDECODE:(NSString*)string
{
    NSString*   s = [string stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];

    return (s)?s:string;
}

This fails to properly decode these characters, however:

filename: æøåa.rtf
input: %C3%83%C2%A6%C3%83%C2%B8a%C3%8C%C2%8Aa.rtf
output: æøaÌa.rtf

EDIT: This is the encoding function:

NSString * URLEncode(NSString * url)
{
    NSString* out = nil;
    @try
    {
        NSLog(@"BEFORE=%@",url);
        out = [url stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
        NSLog(@"AFTER=%@",out);
    }
    @catch (NSException * e)
    {
        NSLog(@"Encoding error: %@", e);
    }

return out;
}

Solution

  • It seems your original URL is already mistakenly encoded in UTF-8.

    "æøåa.rtf" == "\xc3\xa6\xc3\xb8a\xcc\x8aa.rtf"
                   == "æ"      "ø"    "a\u030a" "a.rtf"  // in UTF-8
                   == "æøåa.rtf"
    

    Please check the constructed NSString passed to URLEncode(). The other code you've shown are correct (except that it's rare to handle exceptions in Objective-C).