Search code examples
bashwindows-7cygwinsyntax-errorline-endings

unable to solve "syntax error near unexpected token `fi'" - hidden control characters (CR) / Unicode whitespace


I am new to bash scripting and i'm just trying out new things and getting to grips with it.

Basically I am writing a small script to store the content of a file in a variable and then use that variable in an if statement.

Through step by step i have figured out the ways to store variables and then store content of files as variables. I am now working on if statements.

My test if statement if very VERY basic. I just wanted to grasp the syntax before moving onto more complicated if statement for my program.

My if statement is:

if [ "test" = "test" ]
then
    echo "This is the same"
fi

Simple right? however when i run the script i am getting the error:

syntax error near unexpected token `fi'

I have tried a number of things from this site as well as others but i am still getting this error and I am unsure what is wrong. Could it be an issue on my computer stopping the script from running?

Edit for ful code. Note i also deleted all the commented out code and just used the if statement, still getting same error.

#!/bin/bash
#this stores a simple variable with the content of the file testy1.txt
#DATA=$(<testy1.txt)
#This echos out the stored variable
#echo $DATA
#simple if statement
if [ "test" = "test" ]
then
    echo "has value"
fi

Solution

  • To complement Jens's helpful answer, which explains the symptoms well and offers a utility-based solution (dos2unix). Sometimes installing a third-party utility is undesired, so here's a solution based on standard utility tr:

    tr -d '\r' < script > script.tmp && mv script.tmp script
    

    This removes all \r (CR) characters from the input, saves the output to a temporary file, and then replaces the original file.

    • While this blindly removes \r instances even if they're not part of \r\n (CRLF) pairs, it's usually safe to assume that \r instances indeed only occur as part of such pairs.
    • Solutions with other standard utilities (awk, sed) are possible too - see this answer of mine.
      If your sed implementation offers the -i option for in-place updating, it may be the simpler choice.

    To diagnose the problem I suggest using cat -v script, as its output is easy to parse visually: if you see ^M (which represents \r) at the end of the output lines, you know you're dealing with a file with Window line endings.


    Why Your Script Failed So Obscurely

    Normally, a shell script that mistakenly has Windows-style CRLF line endings, \r\n, (rather than the required Unix-style LF-only endings, \n) and starts with shebang line #!/bin/bash fails in a manner that does indicate the cause of the problem:

    /bin/bash^M: bad interpreter
    

    as a quick SO search can attest. The ^M indicates that the CR was considered part of the interpreter path, which obviously fails.
    (If, by contrast, the script's shebang line is env-based, such as #!/usr/bin/env bash, the error message differs, but still points to the cause: env: bash\r: No such file or directory)

    The reason you did not see this problem is that you're running in the Windows Unix-emulation environment Cygwin, which - unlike on Unix - allows a shebang line to end in CRLF (presumably to also support invoking other interpreters on Windows that do expect CRLF endings).

    The CRLF problem therefore didn't surface until later in your script, and the fact that you had no empty lines after the shebang line further obfuscated the problem:

    • An empty CRLF-terminated line would cause Bash (4.x) to complain as follows: "bash: line <n>: $'\r': command not found, because Bash tries to execute the CR as a command (since it doesn't recognize it as part of the line ending).

    • The comment lines directly following the shebang lines are unproblematic, because a comment line ending in CR is still syntactically valid.

    • Finally, the if statement broke the command, in an obscure manner:

      • If your file were to end with a line break, as is usually the case, you would have gotten syntax error: unexpected end of file:

        • The line-ending then and if tokens are seen as then\r and if\r (i.e., the CR is appended) by Bash, and are therefore not recognized as keywords. Bash therefore never sees the end of the if compound command, and complains about encountering the end of the file before seeing the if statement completed.
      • Since your file did not, you got syntax error near unexpected token 'fi':

        • The final fi, due to not being followed by a CR, is recognized as a keyword by Bash, whereas the preceding then wasn't (as explained). In this case, Bash therefore saw keyword fi before ever seeing then, and complained about this out-of-place occurrence of fi.

    Optional Background Information

    Shell scripts that look OK but break due to characters that are either invisible or only look the same as the required characters are a common problem that usually has one of the following causes:

    • Problem A: The file has Windows-style CRLF (\r\n) line endings rather than Unix-style LF-only (\n) line endings - which is the case here.

      • Copying a file from a Windows machine or using an editor that saves files with CRLF sequences are among the possible causes.
    • Problem B: The file has non-ASCII Unicode whitespace and punctuation that looks like regular whitespace, but is technically distinct.

      • A common cause is source code copied from websites that use non-ASCII whitespace and punctuation for formatting code for display purposes;
        an example is use of the no-break space Unicode character (U+00A0; UTF-8 encoding 0xc2 0xa0), which is visually indistinguishable from a normal (ASCII) space (U+0020).

    Diagnosing the Problem

    The following cat command visualizes:

    • all normally invisible ASCII control characters, such as \r as ^M.
    • all non-ASCII characters (assuming the now prevalent UTF-8 encoding), such as the non-break space Unicode char. as M-BM- .

    ^M is an example of caret notation, which is not obvious, especially with multi-byte characters, but, beyond ^M, it's usually not necessary to know exactly what the notation stands for - you just need to note if the ^<letter> sequences are present at all (problem A), or are present in unexpected places (problem B).

    The last point is important: non-ASCII characters can be a legitimate part of source code, such as in string literals and comments. They're only a problem if they're used in place of ASCII punctuation.

    LC_ALL=C cat -v script
    

    Note: If you're using GNU utilities, the LC_ALL=C prefix is optional.

    Solutions to Problem A: translating line endings from CRLF to LF-only

    • For solutions based on standard or usually-available-by-default utilities (tr, awk, sed, perl), see this answer of mine.

    • A more robust and convenient option is the widely used dos2unix utility, if it is already installed (typically, it is not), or installing it is an option.
      How you install it depends on your platform; e.g.:

      • on Ubuntu: sudo apt-get install dos2unix
      • on macOs, with Homebrew installed, brew install dos2unix

    dos2unix script would convert the line endings to LF and update file script in place.

    Note that dos2unix also offers additional features, such as changing the character encoding of a file.

    Solutions to Problem B: translating Unicode punctuation to ASCII punctuation

    Note: By punctuation I mean both whitespace and characters such as -

    The challenge in this case is that only Unicode punctuation should be targeted, whereas other non-ASCII characters should be left alone; thus, use of character-transcoding utilities such as iconv is not an option.

    nws is a utility (that I wrote) that offers a Unicode-punctuation-to-ASCII-punctuation translation mode while leaving non-punctuation Unicode chars. alone; e.g.:

    nws -i --ascii script  # translate Unicode punct. to ASCII, update file 'script' in place
    

    Installation:

    • If you have Node.js installed, install it by simply running [sudo] npm install -g nws-cli, which will place nws in your path.

    • Otherwise: See the manual installation instructions.

    nws has several other functions focused on whitespace handling, including CRLF-to-LF and vice-versa translations (--lf, --crlf).