Search code examples
linuxwindowsbashmacosstring-concatenation

concatenation of strings in bash results in substitution


I need to read a file into an array and concatenate a string at the end of each line. Here is my bash script:

#!/bin/bash

IFS=$'\n' read -d '' -r -a lines < ./file.list
for i in "${lines[@]}"
do
    tmp="$i"
    tmp="${tmp}stuff"
    echo "$tmp"
done

However, when I do this, an action of replace happens, instead of concatenation.

For example, in the file.list, we have:

http://www.example1.com
http://www.example2.com

What I need is:

http://www.example1.comstuff
http://www.example2.comstuff

But after executing the script above, I get things as below on the terminal:

stuff//www.example1.com
stuff//www.example2.com

Btw, my PC is Mac OS.

The problem also occurs while concatenating strings via awk, printf, and echo commands. For example echo $tmp"stuff" or echo "${tmp}""stuff"


Solution

  • The file ./file.lst is, most probably, generated on a Windows system or, at least, it was saved using the Windows convention for end of line.

    Windows uses a sequence of two characters to mark the end of lines in a text file. These characters are CR (\r) followed by LF (\n). Unix-like systems (Linux and macOS starting with version 10) use LF as end of line character.

    The assignment IFS=$'\n' in front of read in your code tells read to use LF as line separator. read doesn't store the LF characters in the array it produces (lines[]) but each entry from lines[] ends with a CR character.

    The line tmp="${tmp}stuff" does what is it supposed to do, i.e. it appends the word stuff to the content of the variable tmp (a line read from the file).

    The first line read from the input file contains the string http://www.example1.com followed by the CR character. After the string stuff is appended, the content of variable tmp is:

    http://www.example1.com$'\r'stuff
    

    The CR character is not printable. It has a special interpretation when it is printed on the terminal: it sends the cursor at the start of the line (column 1) without changing the line.

    When echo prints the line above, it prints (starting on a new line) http://www.example1.com, then the CR character that sends the cursor back to the start of the line where is prints the string stuff. The stuff fragment overwrites the first 5 characters already printed on that line (http:) and the result, as it is visible on screen, is:

    stuff//www.example1.com
    

    The solution is to get rid of the CR characters from the input file. There are several ways to accomplish this goal.

    A simple way to remove the CR characters from the input file is to use the command:

    sed -i.bak s/$'\r'//g file.list
    

    It removes all the CR characters from the content of file file.list, saves the updated string back into the file.list file and stores the original file.list file as file.list.bak (a backup copy in case it doesn't produce the output you expect).

    Another way to get rid of the CR character is to ask the shell to remove it in the command where stuff is appended:

    tmp="${tmp/$'\r'/}stuff"
    

    When a variable is expanded in a construct like ${tmp/a/b}, all the appearances of a in $tmp are replaced with b. In this case we replace \r with nothing.