Search code examples
performanceshellsubshellgnu-coreutils

Is using builtins to enhance performance negated by gratuitous use of subshells?


I'm writing a script where I need to make sure a string contains a comma. If it doesn't I need the script to exit. Consider the below, where my intent is to only use builtins to enhance performance:

#!/bin/sh

check_for_commas='This string must contain a comma'

comma_found='0'
iterate_through_string="$check_for_commas"
while [ -n "$iterate_through_string" ]; do
    char="$(printf '%.1s' "$iterate_through_string")"

    if [ "$char" = ',' ]; then
        comma_found='1'
        break
    fi

    iterate_through_string="${iterate_through_string#?}"
done

if [ "$comma_found" != '1' ]; then
    echo 'Your string does not contain a comma. Exiting...'
    exit
else
    echo 'Found a comma in the string. Script can continue...'
fi

I am using command substitution in this script, which spawns a subshell for every single character it iterates through. Compare with this:

#!/bin/sh

check_for_commas='This string must contain a comma'

if [ "$(echo "$check_for_commas" | grep -q -F ','; echo "$?")" = '1' ]; then   
    echo 'Your string does not contain a comma. Exiting...'
    exit
else
    echo 'Found a comma in the string. Script can continue...'
fi

I clearly don't mind doing a little extra work to squeeze out extra performance. But I'm concerned that using so many subshells has defeated my whole initial intent.

Does my pursuit of only using builtins to enhance performance become useless when gratuitous use of subshells comes into the picture?


Solution

  • Command substitutions, as in $(printf ...), are indeed expensive -- and you don't need them for what you're doing here.

    case $check_for_commas in
      *,*) echo "Found a comma in the string";;
      *)   echo "No commas present; exiting"; exit 1;;
    esac
    

    In the more general case -- a fork() alone costs less than a fork()/execve() pair, so it's cheaper to have a single subshell than a single external-command invocation; but if you're comparing a loop generating multiple subshells vs a single external-command invocation, which is cheaper depends on how many times your loop will iterate (and how expensive each of these things is on your operating system -- forks are traditionally extra expensive on Windows, for example), and is as such a fact-intensive investigation. :)

    (Speaking to the originally proposed code -- note that ksh93 will optimize away the fork in the specific var=$(printf ...) case; by contrast, in bash, you need to use printf -v var ... to get the same effect).