Search code examples
bashshellunixwc

wc -m seems to stop while loop in bash


I am doing an introduction course on UNIX - part of it is bash scripting. I seem to have understood the concepts, but in this particular problem I can't wrap my head around the issue.

I have a txt file that consists of 1 column with random usernames. That txt file is then used as a parameter for my bash script, that ideally uses the username to fetch a page and count the character count on that page. If the page gets fetched successfully, the character count is then saved along with a username in a different txt file.

Here is a code:

#!/bin/bash
filename=$1

while read username; do
    curl -fs "http://example.website.domain/$username/index.html"
    if [ $? -eq 0 ]
    then
        x=$(wc -m)
        echo "$username $x" > output.txt
    else
        echo "The page doesn't exist"
    fi
done < $filename

Now the problem I have here is that after one successful fetch, it counts the characters, outputs them to the file and just finishes the loop and exits the program. If I remove specifically "wc -m" bit, the code runs perfectly fine.

Q: Is that supposed to happen, how should I go around that to achieve my goal? Or have I made a mistake somewhere else?


Solution

  • The code shown does not do what you think (and claim in your question).

    Your curl command fetches the web and throws it to stdout: you are not keeping this information for future use. Then, your wc does not have any parameter, so it starts reading from stdin. And in stdin you have the list of usernames from $filename, so the number that gets computed are not the chars of the web, but the remaining chars of the file. Once that has been accounted, there is nothing left in stdin to be read, so the loop ends because it got to the end of the file.

    You are looking for something like:

    #!/bin/bash
    filename="$1"
    
    set -o pipefail
    rm -f output.txt
    while read username; do
        x=$(curl -fs "http://example.website.domain/$username/index.html" | wc -m)
        if [ $? -eq 0 ]
        then
            echo "$username $x" >> output.txt
        else
            echo "The page doesn't exist"
        fi
    done < "$filename"
    

    Here, the page fetched is directly fed to the wc. If curl fails you won't see that (the exit code of a series of piped commands is the exit code of the last command by default), so we use set -o pipefail to get the exit code of the rightmost exit code with a value different from zero. Now you can check if everything went OK, and in that case, you can write the result.

    I also added an rm of the output file to make sure we are not growing an existing one and changed the redirection to the output file to an append to avoid re-creating the file on each iteration and ending up with the result of the last iteration (thanks to @tripleee for noting this).

    Update (by popular request):

    The pattern:

    <cmd>
    if [ $? -eq 0 ]...
    

    is usually a bad idea. It is better to go for:

    if <cmd>...
    

    So it would be better if you switch to:

    if x=$(curl -fs "http://example.website.domain/$username/index.html" | wc -m); then
        echo...