Search code examples
javascriptgoogle-apps-scriptencodingcharacter-encodingurlencode

How to generate a Shift_JIS(SJIS) percent encoded string in JavaScript


I'm new to both JavaScript and Google Apps Script and having a problem to convert texts written in a cell to the Shift-JIS (SJIS) encoded letters. For example, the Japanese string "あいう" should be encoded as "%82%A0%82%A2%82%A4" not as "%E3%81%82%E3%81%84%E3%81%86" which is UTF-8 encoded.

I tried EncodingJS and the built-in urlencode() function but it both returns the UTF-8 encoded one.

Would any one tell me how to get the SJIS-encoded letters properly in GAS? Thank you.


Solution

    • You want to do the URL encode from あいう to %82%A0%82%A2%82%A4 as Shift-JIS of the character set.
      • %E3%81%82%E3%81%84%E3%81%86 is the result converted as UTF-8.
    • You want to achieve this using Google Apps Script.

    If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.

    Points of this answer:

    • In order to use Shift-JIS of the character set at Google Apps Script, it is required to use it as the binary data. Because, when the value of Shift-JIS is retrieved as the string by Google Apps Script, the character set is automatically changed to UTF-8. Please be careful this.

    Sample script 1:

    In order to convert from あいう to %82%A0%82%A2%82%A4, how about the following script? In this case, this script can be used for HIRAGANA characters.

    function muFunction() {
      var str = "あいう";
    
      var bytes = Utilities.newBlob("").setDataFromString(str, "Shift_JIS").getBytes();
      var res = bytes.map(function(byte) {return "%" + ("0" + (byte & 0xFF).toString(16)).slice(-2)}).join("").toUpperCase();
      Logger.log(res)
    }
    
    Result:

    You can see the following result at the log.

    %82%A0%82%A2%82%A4
    

    Sample script 2:

    If you want to convert the values including the KANJI characters, how about the following script? In this case, 本日は晴天なり is converted to %96%7B%93%FA%82%CD%90%B0%93V%82%C8%82%E8.

    function muFunction() {
      var str = "本日は晴天なり";
      var conv = Utilities.newBlob("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz*-.@_").getBytes().map(function(e) {return ("0" + (e & 0xFF).toString(16)).slice(-2)});
      var bytes = Utilities.newBlob("").setDataFromString(str, "Shift_JIS").getBytes();
      var res = bytes.map(function(byte) {
        var n = ("0" + (byte & 0xFF).toString(16)).slice(-2);
        return conv.indexOf(n) != -1 ? String.fromCharCode(parseInt(n[0], 16).toString(2).length == 4 ? parseInt(n, 16) - 256 : parseInt(n, 16)) : ("%" + n).toUpperCase();
      }).join("");
      Logger.log(res)
    }
    
    Result:

    You can see the following result at the log.

    %96%7B%93%FA%82%CD%90%B0%93V%82%C8%82%E8
    
    • When 本日は晴天なり is converted with the sample script 1, it becomes like %96%7B%93%FA%82%CD%90%B0%93%56%82%C8%82%E8. This can also decoded. But it seems that the result value converted with the sample script 2 is generally used.

    Flow:

    The flow of this script is as follows.

    1. Create new blob as the empty data.
    2. Put the text value of あいう to the blob. At that time, the text value is put as Shift-JIS of the the character set.
      • In this case, even when blob.getDataAsString("Shift_JIS") is used, the result becomes UTF-8. So the blob is required to be used as the binary data without converting to the string data. This is the important point in this answer.
    3. Convert the blob to the byte array.
    4. Convert the bytes array of the signed hexadecimal to the unsigned hexadecimal.
      • At Google Apps Script, the byte array is uses as he signed hexadecimal. So it is required to convert to the unsigned hexadecimal.
      • When the value is the KANJI character, when the characters of 2 bytes can be converted to the string value as the ascii code, the string value is required to be used. The script of "Sample script 2" can be used for this situation.
        • At above sample, becomes %93V.
    5. Add % to the top character of each byte.

    References:

    If I misunderstood your question and this was not the direction you want, I apologize.