Search code examples
delphicharacter-encodingfile-formatrar

How are non-ASCII file names encoded in RAR files?


I have a RAR file with non ASCII letters in filenames. I tried decoding it in Delphi. My code works fine for ASCII filenames but it failed on these. It is not WideChar, nor UTF8. I found RAR specs here: http://ams.cern.ch/AMS/amsexch/arch/rar/technote.txt but it says nothing about the character encoding. I tried WOTSIT.org but all links to RARs are dead (almost every link is dead there; I even contacted admin but he didn't respond and didn't fix links). It seems it is not an 8bit encoding, but no idea what could it be.


Solution

  • This is the only paragraph that says something about the name:

    0x200 - FILE_NAME contains both usual and encoded
            Unicode name separated by zero. In this case
            NAME_SIZE field is equal to the length
            of usual name plus encoded Unicode name plus 1.
    
            If this flag is present, but FILE_NAME does not
            contain zero bytes, it means that file name
            is encoded using UTF-8.
    

    It seems that it is UTF-8, but you say it is not. Can you try again?