Search code examples
bashshelllsifsm

How to properly process and print files with spaces in bash


I'm writing a simple recursive ls program in bash (which I'm very not experienced at, so feel free to be brutal).

The program is supposed to print out each file (possibly directory) on a separate line, and each time a new directory is entered, the output is shifted over by 4 spaces, to give it a tree-like output.

Currently, it doesn't print out files with spaces correctly, and it doesn't put a forward slash after directories. (More details below.)

Code

recls () {

    # store current working directory
    # issues: seems bad to have cwd defined up here and used down below in getAbsolutePath -- too much coupling
    cwd=$PWD
    # get absolute path of arg
    argdir=`getAbsolutePath "$@"`
    # check if it exists
    if [ ! -e $argdir ]; then
        echo "$argdir does not exist"
        return 1
    fi
    echo "$argdir exists"
    # check if it's a directory
    if [ ! -d $argdir ]; then
        echo "$argdir is not a directory"
        return 2
    fi
    echo "$argdir is a directory"
    tab=""
    recls_internal $argdir
    return 0

}

recls_internal () {

    for file in $@; do
        echo -n "$tab${file##/*/}"
        if [ -d $file ]; then
            # print forward slash to show it's a directory
            echo "/"
            savedtab=$tab
            tab="$tab    "
            myls_internal $file/*
            tab=$savedtab
        else
            # if not a directory, print a new line
            echo ""
        fi   
    done

}

getAbsolutePath () {

    if [ -z ${1##/*} ]; then
        echo "$1"
    else
        echo "$cwd/$1"
    fi

}

Output

The script is contained in a folder called bash-practice. When I do recls ., I get the following output:

./
    myls.sh
    myls.sh~
    recdir.sh
    recls.sh
    recls.sh~
    sample
    document.txt
    sample-folder
        sample-stuff
            test-12.txt
        test-1.txt
        test-2.txt
        sort-test.txt
        sort-text-copy.txt
        test-5-19-14-1

The Problem

As you can see, the indentation is working properly but there are two problems:

1) The file sample document.txt is spread across two lines, because it has a space in it.

2) Every directory should have a forward slash in front of it, but for some reason that only works on the very first one.

Attempted Solution

In order to fix (1), I tried saving the internal file separator and replacing it with a newline character like so:

...
tab=""
savedIFS=$IFS
IFS="\n"
recls_internal $argdir
IFS=$savedIFS
return 0

But this did not work at all. It didn't even display more than the first folder. Clearly my understanding of things is not correct.

As for (2), I don't see any reason why it shouldn't be working as intended.

Conclusion

bash is difficult for me as it seems to have more unusual syntax than most other programming languages (being a shell scripting language), so I would appreciate any insights into my mistakes, as well as a solution.

Update #1

I went to the site http://www.shellcheck.com that mklement0 suggested, and its hints were basically all to double quote things more. When I double quoted "$@", the program correctly printed the file sample document.txt, but then directly after that, it gave me a "binary operator expected" error. Here is a print out of what it looks like now:

enter image description here

Update #2 [problem solved?]

OK, it turns out that I had a typo which was causing it to default to an earlier version of my function called myls_internal when it recursed. This earlier version didn't mark directories with a forward slash. The error message in the "Update" section was also fixed. I changed the line

myls_internal "$file/*"

to

recls_internal $file/*

and now it seems to work properly. If anyone is in the middle of writing an answer, I still appreciate your insights as I don't really understand the mechanics of how quoting "$@" fixed the spacing issue.

Fixed code:

recls () {

    # store current working directory
    # issues: seems bad to have cwd defined up here and used down below in getAbsolutePath -- too much coupling
    cwd=$PWD
    # get absolute path of arg
    argdir=$(getAbsolutePath "$@")
    # check if it exists
    if [ ! -e $argdir ]; then
        echo "$argdir does not exist"
        return 1
    fi
    echo "$argdir exists"
    # check if it's a directory
    if [ ! -d $argdir ]; then
        echo "$argdir is not a directory"
        return 2
    fi
    echo "$argdir is a directory"
    tab=""
    recls_internal $argdir
    return 0

}

recls_internal () {

    for file in "$@"; do
        echo -n "$tab${file##/*/}"
        if [ -d "$file" ]; then
            # print forward slash to show it's a directory
            echo "/"
            savedtab=$tab
            tab="$tab    "
            recls_internal $file/*
            tab=$savedtab
        else
            # if not a directory, print a new line
            echo ""
        fi   
    done

}

getAbsolutePath () {

    if [ -z ${1##/*} ]; then
        echo "$1"
    else
        echo "$cwd/$1"
    fi

}

Fixed output:

enter image description here

Update #3

The line

recls_internal $file/*

should instead be

recls_internal "$file"/*

which handles directories with spaces in them correctly. Otherwise, a folder such as cs 350 containing Homework1.pdf and Homework2.pdf will expand to

cs 350/Homework1.pdf 350/Homework2.pdf

when it should be

cs 350/Homework1.pdf cs 350/Homework2.pdf

I think? I don't really get the finer details of what's going on, but that seemed to fix it.


Solution

  • To illustrate the difference between "$@" and $@, let us consider the two following functions:

    f() { for i in $@; do echo $i; done; }
    
    g() { for i in "$@"; do echo $i; done; }
    

    When calling these function with the parameters a "b c" "d e" the result will be

    • function f

    f a "b c" "d e" a b c d e

    • function g g a "b c" "d e" a b c d e

    So when "$@" is within double quotes, the expansion keeps each parameter in a separate word (even if the parameter contains one or more space). When $@ (without double quotes) is expanded, a parameter with a space will be considered as two words.

    In your script, you need also to surround argdir and file with double quotes. It is useful when the name of a directory or a file contains space so the name will be considered as a single value. Below your script modified.

    #! /bin/bash -u
    recls () {
    
        # store current working directory
        # issues: seems bad to have cwd defined up here and used down below in getAbsolutePath -- too much coupling
        cwd=$PWD
        # get absolute path of arg
        argdir=`getAbsolutePath "$@"`
        # check if it exists
        if [ ! -e "$argdir" ]; then
            echo "$argdir does not exist"
            return 1
        fi
        echo "$argdir exists"
        # check if it's a directory
        if [ ! -d "$argdir" ]; then
            echo "$argdir is not a directory"
            return 2
        fi
        echo "$argdir is a directory"
        tab=""
        recls_internal "$argdir"
        return 0
    
    }
    
    recls_internal () {
    
        for file in "$@"; do
            echo -n "$tab${file##/*/}"
            if [ -d "$file" ]; then
                # print forward slash to show it's a directory
                echo "/"
                savedtab=$tab
                tab="$tab    "
                recls_internal "$file"/*
                tab=$savedtab
            else
                # if not a directory, print a new line
                echo ""
            fi   
        done
    
    }
    
    getAbsolutePath () {
    
        if [ -z ${1##/*} ]; then
            echo "$1"
        else
            echo "$cwd/$1"
        fi
    
    }