Search code examples
phpbashshelltokenize

What is the best way to tokenize bash shell command in PHP?


I am in a situation where I need to take a (potentially) multi-string bash command and squash it into one string that doesn't contain any newline or carriage return characters (yet produces the same result, i.e. command semantics must not be affected).

Below are a few examples of inputs and corresponding expected outputs.

INPUT:

echo A
echo B

EXPECTED OUTPUT:

echo A;echo B

INPUT

echo "continued
string"
echo "other"

EXPECTED OUTPUT:

echo "continued"$'\n'"string";echo "other"

INPUT

cat file1 \
file2 \
file3

EXPECTED OUTPUT:

cat file1 file2 file3

INPUT

for f in `pwd`/*
do
{ echo A; echo B
echo C; echo D; }
done

EXPECTED OUTPUT:

for f in `pwd`/*; do { echo A; echo B; echo C; echo D; }; done

And so on. Obviously I cannot just

preg_replace('/[\r\n]+/', ';', $input);

because shell supports compound commands and command lists, multiline strings, multiline command continuation operator ('\') and many more. Seems like I have no other way but to tokenize the input command and go from there. My bash knowledge is mediocre so there may be cases that I missed and they need to be handled by the solution as well.

Is there an existing PHP library or package (I have searched packagist to no avail) that would help me get closer to my goal? If no, how would you approach this challenge (no need to write code, just point a finger in a right direction).

As a desperate fallback I'll have to resort to porting the bash source code itself, but I really hope that someone will suggest a shortcut.


Solution

  • Trying to create a parser for bash is very ambitious (and ambiguous) project . Bash is constantly evolving, and in certain areas, push the boundary beyond the POSIX standard. Consider scaling down the project - may be target the Posix shell (which will cover many shell variant: dash, ash, ...).

    Consider starting with https://pubs.opengroup.org/onlinepubs/9699919799/ It identifies few quoting options. If you implement those carefully, your approach (replacing end-of-lines with ';' may work).

    Another alternative will be to start with bash syntax highlighter, for example the vim highlighter. (/usr/share/vim/vimNN/syntax/sh.vim, where NN is vim version).