Search code examples
javascriptmacosencodingutf-8iso-8859-1

Javascript -> Download CSV file encoded in ISO-8859-1 / Latin1 / Windows-1252


I have hacked together a small tool to extract shipping data from Amazon CSV order data. it works so far. here is a simple version as JS Bin: http://output.jsbin.com/jarako

For printing stamps/shipping labels, I need a file for uploading to Deutsche Post and to other parcel services. I used a small function saveTextAsFile which i found on stackoverflow. Everything good so far. No wrong displayed special characters (äöüß...) in the output textarea or downloaded files.

All these german post / parcel services sites accept only latin1 / iso-8859-1 encoded files for upload. But my downloaded file is always utf-8. If i upload it, all special characters (äöüß...) go wrong.

How can i change this? I still searched a lot. I have tried i.e.:

Setting the charset of the tool to iso-8859-1:

<META http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

But the result is: Now I have wrong special characters still in the output textarea and in the downloaded file. If I upload it to the post site, I still get more wrong characters. Also if I check the encoding in CODA Editor it still says the downloaded file is UTF-8.

The saveTextAsFile function uses var textFileAsBlob = new Blob([textToWrite], {type:'text/plain'});. May be there is a ways to set the charset for download there!?

function saveTextAsFile()
{
    var textToWrite = $('#dataOutput').val();
    var textFileAsBlob = new Blob([textToWrite], {type:'text/plain'});
    var fileNameToSaveAs = "Brief.txt";

    var downloadLink = document.createElement("a");
    downloadLink.download = fileNameToSaveAs;
    downloadLink.innerHTML = "Download File";
    if (window.webkitURL != null)
    {
        // Chrome allows the link to be clicked
        // without actually adding it to the DOM.
        downloadLink.href = window.webkitURL.createObjectURL(textFileAsBlob);
    }
    else
    {
        // Firefox requires the link to be added to the DOM
        // before it can be clicked.
        downloadLink.href = window.URL.createObjectURL(textFileAsBlob);
        downloadLink.onclick = destroyClickedElement;
        downloadLink.style.display = "none";
        document.body.appendChild(downloadLink);
    }

    downloadLink.click();
}

Anyhow, there have to be a way to download files in other encoding as the site uses itself. The Amazon site, where i download the CSV file from is UTF-8 encoded. But downloaded CSV file from there is Latin1 (iso-8859-1) if i check it in CODA...


Solution

  • SCROLL DOWN TO THE UPDATE for the real solution!

    Because I got no answer, I have searched more and more. It looks like there is NO SOLUTION in Javascript. Every test download I'v made, which was generated in javascript was UTF-8 encoded. Looks like Javascript is only made for UNICODE / UTF-8 or an other encoding would (possibly) only apply if the data would be transported again using a former HTTP transport. But for a Javascript, which runs on the client no additional HTTP transport happens, because the data is still on the client..

    I have helped me now with building a small PHP Script on my server, to which i send the Data via GET or POST request. It converters the encoding to latin1 / ISO-8859-1 and downloads it as file. This is a ISO-8859-1 file with correctly encoded special characters, which I can upload to the mentioned postal and parcel service sites and everything looks good.

    latin-download.php: (It is VERY IMPORTANT to save the PHP file itself also in ISO-8859-1, to make it work!!)

    <?php
    $decoded_a = urldecode($_REQUEST["a"]);
    $converted_to_latin = mb_convert_encoding($decoded_a,'ISO-8859-1', 'UTF-8');
    $filename = $_REQUEST["filename"];
    header('Content-Disposition: attachment; filename="'.$filename.'"; content-type: text/plain; charset=iso-8859-1;');
    echo $converted_to_latin;
    ?>
    

    in my javascript code i use:

    <a id="downloadlink">Download File</a>
    
    <script>
    var mydata = "this is testdata containing äöüß";
    
    document.getElementById("downloadlink").addEventListener("click", function() {
        var mydataToSend = encodeURIComponent(mydata);
        window.open("latin-download.php?a=" + mydataToSend + "&filename=letter-max.csv");
    }, false);
    </script>
    

    for bigger amounts of data you have to switch from GET to POST...

    UPDATE 08-Feb-2016

    A half year later now i have found a solution in PURE JAVASCRIPT. Using inexorabletash/text-encoding. This is a polyfill for Encoding Living Standard. The standard includes decoding of old encodings like latin1 ("windows-1252"), but it forbids encoding into these old encoding types. So if you use the browser implemented window.TextEncoder function it does offer only UTF encoding. BUT, the polyfill solution offers a legacy mode, which does ALLOW also encoding into old encodings like latin1.

    i use it like that:

    <!DOCTYPE html>
    <script>
    // 'Copy' browser build in TextEncoder function to TextEncoderOrg (because it can NOT encode windows-1252, but so you can still use it as TextEncoderOrg()  )
    var TextEncoderOrg = window.TextEncoder;   
    // ... and deactivate it, to make sure only the polyfill encoder script that follows will be used 
    window.TextEncoder = null;  
    
    </script>
    <script src="lib/encoding-indexes.js"></script>  // needed to support encode to old encoding types
    <script src="lib/encoding.js"></script>  // encording polyfill
    
    <script>
    
    function download (content, filename, contentType) {
        if(!contentType) contentType = 'application/octet-stream';
            var a = document.createElement('a');
            var blob = new Blob([content], {'type':contentType});
            a.href = window.URL.createObjectURL(blob);
            a.download = filename;
            a.click();
    }
    
    var text = "Es wird ein schöner Tag!";
    
    // Do the encoding
    var encoded = new TextEncoder("windows-1252",{ NONSTANDARD_allowLegacyEncoding: true }).encode(text);
    
    // Download 2 files to see the difference
    download(encoded,"windows-1252-encoded-text.txt");
    download(text,"utf-8-original-text.txt");
    
    </script>
    

    The encoding-indexes.js file is about 500kb big, because it contains all the encoding tables. Because i need only windows-1252 encoding, for my use i have deleted the other encodings in this file. so now there are only 632 byte left.