I need to read a file into an array and concatenate a string at the end of each line. Here is my bash script:
#!/bin/bash
IFS=$'\n' read -d '' -r -a lines < ./file.list
for i in "${lines[@]}"
do
tmp="$i"
tmp="${tmp}stuff"
echo "$tmp"
done
However, when I do this, an action of replace
happens, instead of concatenation.
For example, in the file.list
, we have:
http://www.example1.com
http://www.example2.com
What I need is:
http://www.example1.comstuff
http://www.example2.comstuff
But after executing the script above, I get things as below on the terminal:
stuff//www.example1.com
stuff//www.example2.com
Btw, my PC is Mac OS.
The problem also occurs while concatenating strings via awk
, printf
, and echo
commands. For example echo $tmp"stuff"
or echo "${tmp}""stuff"
The file ./file.lst
is, most probably, generated on a Windows system or, at least, it was saved using the Windows convention for end of line.
Windows uses a sequence of two characters to mark the end of lines in a text file. These characters are CR
(\r
) followed by LF
(\n
). Unix-like systems (Linux and macOS starting with version 10) use LF
as end of line character.
The assignment IFS=$'\n'
in front of read
in your code tells read
to use LF
as line separator. read
doesn't store the LF
characters in the array it produces (lines[]
) but each entry from lines[]
ends with a CR
character.
The line tmp="${tmp}stuff"
does what is it supposed to do, i.e. it appends the word stuff
to the content of the variable tmp
(a line read from the file).
The first line read from the input file contains the string http://www.example1.com
followed by the CR
character. After the string stuff
is appended, the content of variable tmp
is:
http://www.example1.com$'\r'stuff
The CR
character is not printable. It has a special interpretation when it is printed on the terminal: it sends the cursor at the start of the line (column 1) without changing the line.
When echo
prints the line above, it prints (starting on a new line) http://www.example1.com
, then the CR
character that sends the cursor back to the start of the line where is prints the string stuff
. The stuff
fragment overwrites the first 5 characters already printed on that line (http:
) and the result, as it is visible on screen, is:
stuff//www.example1.com
The solution is to get rid of the CR
characters from the input file. There are several ways to accomplish this goal.
A simple way to remove the CR
characters from the input file is to use the command:
sed -i.bak s/$'\r'//g file.list
It removes all the CR
characters from the content of file file.list
, saves the updated string back into the file.list
file and stores the original file.list
file as file.list.bak
(a backup copy in case it doesn't produce the output you expect).
Another way to get rid of the CR
character is to ask the shell to remove it in the command where stuff
is appended:
tmp="${tmp/$'\r'/}stuff"
When a variable is expanded in a construct like ${tmp/a/b}
, all the appearances of a
in $tmp
are replaced with b
. In this case we replace \r
with nothing.