Search code examples
node.jsformsurl-encoding

urlencoding form data with windows-1252 charset in node.js


I need to post a form that has been set to use windows-1252 charset for urlencoding its data. for simple characters, default encoding (utf8) works but it is the special characters that have to be encoded with the required charset.

the npm "request" package i am using does not allow setting any specific charset and uses utf8 by default underneath. i tried another package "Restler", which allows encoding to be set but it throws exception saying invalid charset when i specify windows-1252 (Node only offers a handful of encoding charsets (Buffer class) and windows-1252 is not one of them).

please let me know whether what i am trying to achieve is even possible in node nor not? for verification purposes, i created a little client in java and used apache's http client library with windows-1252 encoding and my request was successfully accepted by the server. so far, i have not been able to figure this out in node.


Solution

  • Sending HTTP request data in a legacy encoding like Windows-1252 is not straightforward in node, as there is no native support for these encodings.

    Support can be added in the form of an iconv library, so it's definitely doable, even if it does not work out of the box.

    The following targets restler, because you are using it, but in principle this applies to any client HTTP library.

    Notes:

    • Traditional HTTP POSTs are URL-encoded, we will use qs for this.
    • Support for encodings other than UTF-8 will be provided by qs-iconv, as documented in qs - Dealing with special character sets.
    • Restler usually encodes data as UTF-8 if you pass it as a string or plain object, but if you pass a Buffer, Restler will send it as it is.
    • Setting a proper Content-Type and Content-Length will ensure the data can be interpreted properly at the receiving end. Since we supply our own data here, we need to set those headers manually.
    • Be aware that any character that is not contained in the target charset (Windows-1252 in this case) will be encoded as ? by iconv (%3F in URL form) and therefore will be lost.

    Code:

    var rest = require('restler');
    var qs = require('qs');
    var win1252 = require('qs-iconv/encoder')('win1252');
    
    var requestData = {
      key1: "‘value1‘",
      key2: "‘value2‘"
    };
    
    var requestBody = qs.stringify(requestData, { encoder: win1252 });
    // => "key1=%91value1%91&key2=%91value2%91"
    
    var requestBuf = new Buffer(requestBody);
    
    rest.post('your/url', {
      data: requestBuf,
      headers: {
        'Content-Type': 'application/x-www-form-urlencoded; charset=windows-1252',
        'Content-Length': requestBuf.length
      }
    }).on('complete', function(data) {
      console.log(data);
    });