Search code examples
excelcsvawkvbscriptcygwin

How do I use awk under cygwin to print fields from an excel spreadsheet?


We seem to be seeing more and more questions about executing awk on Excel spreadsheets so here is a Q/A on how to do that specific thing.

I have this information in an Excel spreadsheet "$D/staff.xlsx" (where "$D" is the path to my Desktop):

Name   Position
Sue    Manager
Bill   Secretary
Pat    Engineer

and I want to print the Position field for a given Name, e.g. output Secretary given the input Bill.

I can currently save as CSV from Excel to get:

$ cat "$D/staff.csv"
Name,Position
Sue,Manager
Bill,Secretary
Pat,Engineer

and then run:

$ awk -F, -v name="Bill" '$1==name{print $2}' "$D/staff.csv"
Secretary

but this is just a small part of a larger task and so I have to be able to do this automatically from a shell script without manually opening Excel to export the CSV file. How do I do that from a Windows PC running cygwin?


Solution

  • The combination of the following VBS and shell scripts create a CSV file for each sheet in the Excel spreadsheet:

    $ cat xls2csv.vbs
    csv_format = 6
    
    Dim strFilename
    Dim objFSO
    Set objFSO = CreateObject("scripting.filesystemobject")
    strFilename = objFSO.GetAbsolutePathName(WScript.Arguments(0))
    If objFSO.fileexists(strFilename) Then
      Call Writefile(strFilename)
    Else
      wscript.echo "no such file!"
    End If
    Set objFSO = Nothing
    
    Sub Writefile(ByVal strFilename)
    Dim objExcel
    Dim objWB
    Dim objws
    
    Set objExcel = CreateObject("Excel.Application")
    Set objWB = objExcel.Workbooks.Open(strFilename)
    
    For Each objws In objWB.Sheets
      objws.Copy
      objExcel.ActiveWorkbook.SaveAs objWB.Path & "\" & objws.Name & ".csv", csv_format
      objExcel.ActiveWorkbook.Close False
    Next
    
    objWB.Close False
    objExcel.Quit
    Set objExcel = Nothing
    End Sub
    

    .

    $ cat xls2csv
    PATH="$HOME:$PATH"
    
    # the original XLS input file path components
    inXlsPath="$1"
    inXlsDir=$(dirname "$inXlsPath")
    xlsFile=$(basename "$inXlsPath")
    xlsBase="${xlsFile%.*}"
    
    # The tmp dir we'll copy the XLS to and run the tool on
    # to get the CSVs generated
    tmpXlsDir="/usr/tmp/${xlsBase}.$$"
    tmpXlsPath="${tmpXlsDir}/${xlsFile}"
    absXlsPath="C:/cygwin64/${tmpXlsPath}" # need an absolute path for VBS to work
    
    mkdir -p "$tmpXlsDir"
    
    trap 'rm -f "${tmpXlsDir}/${xlsFile}"; rmdir "$tmpXlsDir"; exit' 0
    
    cp "$inXlsPath" "$tmpXlsDir"
    
    cygstart "$HOME/xls2csv.vbs" "$absXlsPath"
    
    printf "Waiting for \"${tmpXlsDir}/~\$${xlsFile}\" to be created:\n" >&2
    while [ ! -f "${tmpXlsDir}/~\$${xlsFile}" ]
    do
        # VBS is done when this tmp file is created and later removed
        printf "." >&2
        sleep 1
    done
    printf " Done.\n" >&2
    
    printf "Waiting for \"${tmpXlsDir}/~\$${xlsFile}\" to be removed:\n" >&2
    while [ -f "${tmpXlsDir}/~\$${xlsFile}" ]
    do
        # VBS is done when this tmp file is removed
        printf "." >&2
        sleep 1
    done
    printf " Done.\n" >&2
    
    numFiles=0
    for file in "$tmpXlsDir"/*.csv
    do
        numFiles=$(( numFiles + 1 ))
    done
    
    if (( numFiles >= 1 ))
    then
        outCsvDir="${inXlsDir}/${xlsBase}.csvs"
        mkdir -p "$outCsvDir"
        mv "$tmpXlsDir"/*.csv "$outCsvDir"
    fi
    

    Now we execute the shell script which internally calls cygstart to run the VBS script to generate the CSV files (one per sheet) in a subdirectory under the same directory where the Excel file exists named based on the Excel file name (e.g. Excel file staff.xlsx produces CSVs directory staff.csvs):

    $ ./xls2csv "$D/staff.xlsx"
    Waiting for "/usr/tmp/staff.2700/~$staff.xlsx" to be created:
    .. Done.
    Waiting for "/usr/tmp/staff.2700/~$staff.xlsx" to be removed:
    . Done.
    

    There is only one sheet with the default name Sheet1 in the target Excel file "$D/staff.xlsx" so the output of the above is a file "$D/staff.csvs/Sheet1.csv":

    $ cat "$D/staff.csvs/Sheet1.csv"
    Name,Position
    Sue,Manager
    Bill,Secretary
    Pat,Engineer
    
    $ awk -F, -v name="Bill" '$1==name{print $2}' "$D/staff.csvs/Sheet1.csv"
    Secretary
    

    Also see What's the most robust way to efficiently parse CSV using awk? for how to then operate on those CSVs.

    See also https://stackoverflow.com/a/58879683/1745001 for how to do the opposite, i.e. call a cygwin bash command from a Windows batch file.