Search code examples
csvbatch-filequotation-marks

Batch: Convert .csv to tab-delimited text, only some fields are quoted, contain commas between quotes (eBay order file)


I'm trying to convert the eBay File Exchange download into a tab-delimited format my shipping software can read.

If each and every column were quoted, this would be easy--but they're not. Only some columns (name, item listing title, etc) are quoted, and some quoted columns contain commas. The rest are bare of quotes.

I need a way to parse and convert this in a .bat file, but using comma as a delimiter splits the quoted fields if they contain a comma too, giving me unusable data. I'm certain there's a simple fix for this, I just can't figure it out.


Solution

  • Eric J is correct - solving this kind of problem with batch is not simple. But it is possible :-)

    The main problem is how to differentiate between quoted and unquoted commas - jeb solved a similar problem with quoted vs. unquoted semicolons at 'Pretty print' windows %PATH% variable - how to split on ';' in CMD shell. The code below looks very different, but the fundamental concept is the same.

    The code below should work for pretty much any CSV as long as all lines are less than ~8000 bytes long. Batch variable values are limited to 8191 bytes, and some characters are temporarily expanded to two bytes.

    The code assumes there are not any existing TABs within the CSV file.

    It does not modify any existing quotes.

    As I say, the code should work, but it will be painfully SLOW if you have a large file. You would be much better off with a .NET solution as Eric J suggested.

    @echo off
    setlocal disableDelayedExpansion
    
    set "file=optionalPathinfo\yourFile.csv"
    
    :: Define a TAB variable
    for /f "delims=" %%A in (
      'forfiles /p "%~dp0." /m "%~nx0" /c "cmd /c echo(0x09"'
    ) do set "TAB=%%A"
    
    
    :: Read each line from CSV, convert it, and write to new file with .new extension
    >"%file%.new" (
      for /f usebackq^ delims^=^ eol^= %%A in ("%file%") do (
        set "line=%%A"
        call :processLine
      )
    )
    exit /b
    
    
    :processLine
    setlocal enableDelayedExpansion
    
    :: Protect problem characters
    set "line=!line:@=@A!"
    set "line=!line:^=@K!"
    set "line=!line:&=@M!"
    set "line=!line:|=@P!"
    set "line=!line:<=@L!"
    set "line=!line:>=@G!"
    
    :: Mark commas with leading caret (escape)
    set "line=!line:,=^,!"
    
    :: Remove mark from unquoted commas, but first temporarily
    :: disable delayed expansion to protect any ! characters
    setlocal disableDelayedExpansion
    set ^"line=%line%"
    setlocal enableDelayedExpansion
    
    :: Protect remaining marked commas
    set "line=!line:^,=@C!"
    
    :: Convert remaining commas to TAB
    set "line=!line:,=%TAB%!"
    
    :: Restore protected characters
    set "line=!line:@C=,!"
    set "line=!line:@G=>!"
    set "line=!line:@L=<!"
    set "line=!line:@P=|!"
    set "line=!line:@M=&!"
    set "line=!line:@K=^!"
    set "line=!line:@A=@!"
    
    :: Write modified line
    echo(!line!
    exit /b