Search code examples
encodingcharacter-encodingurlencodenon-ascii-characters

I want to identify encoding of these videos


I have a bunch of videos I downloaded 20 years ago now. The website I believe had them in Japanese. My PC at the time didn't understand unicode characters and I downloaded them with Download Accelerator Plus I believe! So all of the video titles look like a mixture of broken ASCII and URLEncoded characters

Is there any way to get these titles back? Here are some samples:

  1. %ec†%a1%ecŠ%b9%ec„%a0.avi
  2. %ea%b0•%ec%a2…%ea%b5%ac, %ec†%ec%a3%bc%ed™˜.avi
  3. %ea%b5%ac%ec%a2…%eb%a7Œ.avi
  4. %ecœ%a4%ec%b0%bd%ec%bc.avi
  5. %ea%b6Œ%eb%af%bc%ec%a3%bc (%e2˜…%e2˜…).avi

I dont remember the url, so I cannot check web archives

Any input welcome.

Thank you


Solution

  • How did you translate it all?

    First, suppose UTF-8 as 0xec, 0xed or 0xea are first bytes of three-byte UTF-8 sequences; then

    • convert every URL-encoded character to its byte value (e.g. %a1 to 0xa1), and
    • take ANSI 1252 byte values of every literal character e.g. †Š•…™˜Œœ,().

    Then you have UTF-8 byte sequence for whole string and you can decode it simply.

    Exceptions:

    • missing character in string #17 (two-char string %ec† should be converted to a three-byte sequence, added 0x81);
    • the same in string #19 (two-char string %ec%bc should be converted to a three-byte sequence).

    Example (manual conversion, exceptions indicated by ↑↑↑↑ in the following byte sequences, added 0x81):

    16 0xec,134,0xa1,0xec,138,0xb9,0xec,132,0xa0
    송승선
    17 0xea,0xb0,149,0xec,0xa2,133,0xea,0xb5,0xac,0x2c,0x20,0xec,134,0x81,0xec,0xa3,0xbc,0xed,153,152
    강종구, 솁주환                                                    ↑↑↑↑
    18 0xea,0xb5,0xac,0xec,0xa2,133,0xeb,0xa7,140
    구종만
    19 0xec,156,0xa4,0xec,0xb0,0xbd,0xec,0xbc,0x81
    윤창켁                                     ↑↑↑↑   
    20 0xea,0xb6,140,0xeb,0xaf,0xbc,0xec,0xa3,0xbc,0x20,0x28,0xe2,152,133,0xe2,152,133,0x29
    권민주 (★★)
    

    Google translate:

    detected Korean