I've come across a rather mystifying bug in bash, which I suspect has to do with the shell expansion rules.
Here's the story: at work, I've been tasked with documenting a massive internal website for coordinating company resources. Unfortunately, the code is quite ugly, as it has outgrew it's original purpose and "evolved" into the main resource for coordinating company efforts.
Most of the code is PHP. I wrote a few helper scripts to help me write the documentation; for example, one script extracts all the global php variables used in a php function.
At the center of all these scripts lies the "extract_function.sh" script. Basically, given a single php function name and a php source file, it extracts and outputs that php function.
Now here's the problem: somehow, as the script is extracting the function, it is basically inserting the output of ls /
randomly within the output.
For example:
$ ./extract_function my_function my_php_file.php
function my_function {
// php code
/etc
/bin
/proc
...
// more php code
}
Even more confusingly, I've only gotten this to occur for one specific function from one specific file! Now, since the function is quite huge (500+ lines, I mean it when I say the code is ugly!), I haven't been able for the life of me to figure out what is causing this, or to come up with a simpler ad-hoc function to produce this behavior. Also, company policy prevents me from sharing the actual code.
However, here is my code:
#!/usr/bin/env bash
program_name=$(basename $0);
function_name=$1;
file_name=$2;
if [[ -z "$function_name" ]]; then
(>&2 echo "Usage: $program_name function_name [file]")
exit 1
fi
if [[ -z "$file_name" ]] || [ "$file_name" = "-" ]; then
file_name="/dev/stdin";
fi
php_lexer_file=$(mktemp)
trap "rm -f $php_lexer_file" EXIT
read -r -d '' php_lexer_text << 'EOF'
<?php
$file = file_get_contents("php://stdin");
$tokens = token_get_all($file);
foreach ($tokens as $token)
if ($token === '{')
echo PHP_EOL, "PHP_BRACKET_OPEN", PHP_EOL;
else if ($token == '}')
echo PHP_EOL, "PHP_BRACKET_CLOSE", PHP_EOL;
else if (is_array($token))
echo $token[1];
else
echo $token;
?>
EOF
echo "$php_lexer_text" > $php_lexer_file;
# Get all output from beginning of function declaration
extracted_function_start=$(sed -n -e "/function $function_name(/,$ p" < $file_name);
# Prepend <?php so that php will parse the file as php
extracted_function_file=$(mktemp)
trap "rm -f $extracted_function_file" EXIT
echo '<?php' > $extracted_function_file;
echo "$extracted_function_start" >> $extracted_function_file;
tokens=$(php $php_lexer_file < $extracted_function_file);
# I've checked, and at this point $tokens does not contain "/bin", "/lib", etc...
IFS=$'\n';
open_count=0;
close_count=0;
for token in $tokens; do # But here the output of "ls /" magically appears in $tokens!
if [ $token = "PHP_BRACKET_OPEN" ]; then
open_count=$((open_count+1))
token='{';
elif [ $token == "PHP_BRACKET_CLOSE" ] ; then
close_count=$((close_count+1))
token='}';
fi
echo $token;
if [ $open_count -ne 0 ] && [ $open_count -eq $close_count ]; then
break;
fi
done
Yes, I know that I shouldn't be using bash to manipulate php code, but I basically have two questions:
1) Why is bash doing this?
2) And, how can I fix it?
One of the tokens in $tokens
is a * (or a glob pattern which can match several files). If you cannot arrange for the token list to not contain shell metacharacters, you will need to jump through some hoops to avoid expansion. One possible technique is to use read -ra
to read the tokens into an array, which will make it easier to quote them.