Search code examples
cmdencoding

Problem changing text file encoding using cmd


I've got a task to change encoding of .txt file to Windows-1251, OEM866 and UTF-8 using only cmd recently. I've tried using:

  1. chcp 866
  2. cmd /u /c /d type 1.txt > 866.txt But the text file had UTF-16 encoding, despite looking like a OEM866 text.

Solution

  • I'd say that the task (convert encoding of given files from one encoding to another like iconv tool does) is solvable using only cmd: first, create two auxiliary binary files bomUtf16le.bin and bomUtf8.bin as follows:

    REM do dot run as a batch file; copy&paste the code into an open cmd window
    
    :: create a testing folder and change the current directory
    2>NUL md .\SO\69595742
    pushd    .\SO\69595742
    
    :: create file bomUtf16le.bin (BOM, encoding utf16LE)
    >NUL chcp 1252
    <nul set /p x=ÿþ>bomUtf16le.bin
    :: create file bomUtf8.bin    (BOM, encoding utf8)
    >NUL chcp 1252
    <nul set /p x=>bomUtf8.bin
    
    :: create file a1200.txt (a Cyrillic text, encoding utf16LEbom)
    >NUL copy /Y /B bomUtf16le.bin a1200.txt 
    cmd /U /D /C "(echo русский текст&echo кирилловский шрифт)>>a1200.txt"
    
    popd
    

    Important: do dot run above code snippet from a batch file; copy&paste the code into an open cmd window!
    The code creates an initial testing file a1200.txt (encoding utf16LEbom). We could begin with a file of any supported encoding 1251 or 866 or 65001(==Utf8bom) because below conversions are designed to work cyclically (proved by binary comparison using fc command, and manually confirmed by opening all files in notepad++). The following code snippet assumes initial testing file encoding utf16LEbom.

    Then run the following (run as a batch file, or copy&paste the code into an open cmd window):

    @ECHO OFF
    SETLOCAL EnableExtensions
    
    :: run as a batch file, or copy&paste the code into an open cmd window
    
    2>NUL md .\SO\69595742
    pushd    .\SO\69595742
    
    :: convert file a1200.txt to cp1251
    >NUL chcp 1251
    type a1200.txt>x1251.txt
    
    :: convert file a1200.txt to cp866
    >NUL chcp 866
    type a1200.txt>x866.txt
    
    :: convert file a1200.txt to utf-8 BOM
    >NUL copy /Y /B bomUtf8.bin x65001bom.txt
    >NUL chcp 65001
    type a1200.txt>>x65001Bom.txt
    
    :: convert file x866.txt to file x1200.txt (encoding utf16LEbom)
    >NUL copy /Y /B bomUtf16le.bin x1200.txt
    >NUL chcp 866
    cmd /U /D /C "type x866.txt>>x1200.txt"
    
    :: Perform a binary comparison (FC: no differences encountered)
    fc /B x1200.txt a1200.txt
    
    :: convert file x1251.txt to file y1200.txt (encoding utf16LEbom)
    :: analogous to: x866.txt to file x1200.txt
    >NUL copy /Y /B bomUtf16le.bin y1200.txt
    >NUL chcp 1251
    cmd /U /D /C "type x1251.txt>>y1200.txt"
    
    :: Perform a binary comparison (FC: no differences encountered)
    fc /B y1200.txt a1200.txt
    
    :: convert file x65001bom.txt to file z1200.txt (encoding utf16LEbom)
    >NUL chcp 65001
    cmd /U /D /C "type x65001bom.txt>z1200.txt"
    
    :: Perform a binary comparison (FC: no differences encountered)
    fc /B z1200.txt a1200.txt
    
    :: convert file a1200.txt to x65001noBom.txt (utf-8 no BOM, merely for completeness)
    >NUL chcp 65001
    type a1200.txt>x65001noBom.txt
    
    dir *.txt | findstr /I "\.txt$"
    
    popd
    
    goto :eof
    

    Result: .\SO\69595742.bat

    Comparing files x1200.txt and A1200.TXT
    FC: no differences encountered
    
    Comparing files y1200.txt and A1200.TXT
    FC: no differences encountered
    
    Comparing files z1200.txt and A1200.TXT
    FC: no differences encountered
    
    17/10/2021  19:24                72 a1200.txt
    17/10/2021  21:49                72 x1200.txt
    17/10/2021  21:49                35 x1251.txt
    17/10/2021  21:49                67 x65001Bom.txt
    17/10/2021  21:49                64 x65001noBom.txt
    17/10/2021  21:49                35 x866.txt
    17/10/2021  21:49                72 y1200.txt
    17/10/2021  21:49                72 z1200.txt
    

    Summary (incomplete): file conversions (⇆ reversible)

    Direct:

    • utf-16-le-bomcp866
    • utf-16-le-bomcp1251
    • utf-16-le-bomutf-8-bom
    • utf-16-le-bomutf-8-noBom

    Possible (thru an auxiliary file):

    • cp866utf-16-le-bomcp1251
    • cp866utf-16-le-bomutf-8-bom
    • utf-8-bomutf-16-le-bomcp1251

    Possible utf-8-noBomutf-8-bom as follows:

    copy /B bomUtf8.bin + fileutf-8-noBom.txt fileutf-8-bom.txt
    

    Tested in Windows 10 with the following Administrative language settings; not tested with that Beta checkbox unticked: Administrative language settings