Search code examples
bashgrep

pgrep matches undesired patterns


This is the beginning of a script for determining whether to run tmux, it has been modified for debugging purposes: (no tmux instance is ever running in this test example)

#!/bin/bash -x

if  [[ -n "$TMUX" ]]; then
    echo "We are inside tmux"
    exit 1
fi

if [[ $(pgrep -c tmux) -ne 0 ]]; then
    echo "joining tmux session"
    tmux a
        exit 0
fi

bash

Script run on terminal 1: alacritty -e ~/tmux.sh

Output on terminal 2:

+ [[ -n '' ]]
++ pgrep -c tmux
+ [[ 1 -ne 0 ]]
+ echo 'joining tmux session'
joining tmux session
+ tmux a
no sessions
+ bash
[user@hostname ~]$ pgrep -a tmux
27220 /bin/bash -x /home/user/tmux.sh

I expected the condition to turn false, [[ 0 -ne 0 ]], pgrep should not(?) recognize /bin/bash -x /home/user/tmux.sh since I'm looking for a tmux process, not the string tmux as some process parameter. Moreover, if on a terminal I run vim ~/tmux.sh, and pgrep -c tmux on another, it returns 0 (while no tmux instance running). So this kind of behavior does not seems to be consistent.

Why I'm getting different results? Thanks.

Update 1, System info:
alacritty 0.13.2 (bb8ea18e)
bash 5.2.26(1)-release
pgrep from procps-ng 4.0.4
archlinux

Update 2: It is worth noting that when the script is run with sh/bash, pgrep behaves as expected. It is only when run through alacritty -e ~/tmux.sh that this confusing behavior arises.
Update 3: Also happens when run from a foot or xterm terminal foot ~/tmux.sh.
Update 4: video


Solution

  • The confusion arises because behind the scenes, obscure things happen when one attempts to execute a text file (ie. a file that is not a binary executable). Additionally, the output of pgrep -a regex may not be the string it used to search for the match.

    • Processes are generally invoked by execve(2). Its manpage notes:

    The process name, as set by prctl(2) PR_SET_NAME (and displayed by ps -o comm), is reset to the name of the new executable file

    • Text files are not directly executable; When one attempts to run a text file, behind the scenes the kernel notices that the start of the file is not the normal magic string that indicates a program (typically E L F \002 \001 \001 \0 \0 \0 \0 \0 \0 \0 \0 \0).
      • If the kernel sees the file starts with # !, it splits the remainder of the first line into two and tries to execute the first part (the interpreter), passing as arguments the second part (often empty, or a flag telling the interpreter to expect a script), and the original process name (typically used by the interpreter as the filename of the script it should run).
      • The kernel does not reset the ps comm value.
    • pgrep regex tries to match a regular expression against the "process name" (ps comm value).
      • This can be confusing in presence of processes invoked as above, if one expects that "process name" should be the name of the interpreter (as pgrep -a regex and the pgrep manpage could lead one to believe is the case).

    I don't know Rust so can't follow the alacritty source code but running under strace, I see that alacritty -e ./myscript does eventually do execve("./myscript", [...]).


    As an example, here is a bash script that invokes a Perl script that resets its process name. See the confusion that results when searching for it with pgrep. Note the different values of /proc/#pid/exe, /proc/#pid/comm and /proc/#pid/cmdline:

    tester:

    #!/bin/bash
    
    cat >myscript <<'EOD'
    #!/usr/bin/perl
    
    use Linux::Prctl(set_name);
    sleep 1;
    set_name("gotcha!");
    sleep 3;
    EOD
    
    chmod +x myscript
    
    myinfo()(
        echo
        echo exe:
        ls -l /proc/$!/exe
        echo ---
        echo comm:
        cat /proc/$!/comm
        echo ---
        echo cmdline:
        tr '\0' '\n' </proc/$!/cmdline
        echo ----
        echo pgrep -a myscript:
        pgrep -a myscript
        echo ----
        echo pgrep -a gotcha:
        pgrep -a gotcha
        echo
    )
    
    ./myscript &
    echo "=== (directly) before ==="
    myinfo
    sleep 2
    echo "=== (directly) after ==="
    myinfo
    
    sleep 5
    
    /usr/bin/perl ./myscript &
    echo "=== (indirectly) before ==="
    myinfo
    sleep 2
    echo "=== (indirectly) after ==="
    myinfo
    
    === (directly) before ===
    
    exe:
    lrwxrwxrwx 1 jhnc jhnc 0 Apr 13 18:50 /proc/252563/exe -> /usr/bin/perl
    ---
    comm:
    myscript
    ---
    cmdline:
    /usr/bin/perl
    ./myscript
    ----
    pgrep -a myscript:
    252563 /usr/bin/perl ./myscript
    ----
    pgrep -a gotcha:
    
    === (directly) after ===
    
    exe:
    lrwxrwxrwx 1 jhnc jhnc 0 Apr 13 18:50 /proc/252563/exe -> /usr/bin/perl
    ---
    comm:
    gotcha!
    ---
    cmdline:
    /usr/bin/perl
    ./myscript
    ----
    pgrep -a myscript:
    ----
    pgrep -a gotcha:
    252563 /usr/bin/perl ./myscript
    
    === (indirectly) before ===
    
    exe:
    lrwxrwxrwx 1 jhnc jhnc 0 Apr 13 18:50 /proc/252582/exe -> /usr/bin/perl
    ---
    comm:
    perl
    ---
    cmdline:
    /usr/bin/perl
    ./myscript
    ----
    pgrep -a myscript:
    ----
    pgrep -a gotcha:
    
    === (indirectly) after ===
    
    exe:
    lrwxrwxrwx 1 jhnc jhnc 0 Apr 13 18:50 /proc/252582/exe -> /usr/bin/perl
    ---
    comm:
    gotcha!
    ---
    cmdline:
    /usr/bin/perl
    ./myscript
    ----
    pgrep -a myscript:
    ----
    pgrep -a gotcha:
    252582 /usr/bin/perl ./myscript
    
    

    summary:

    directly before directly after indirectly before indirectly after
    /proc/#pid/comm myscript gotcha! perl gotcha!
    pgrep -a myscript found not found not found not found
    pgrep -a gotcha not found found not found found

    /proc/#pid/exe and /proc/#pid/cmdline don't change.

    When pgrep -a succeeds it always outputs cmdline (not the comm value).


    These tests were run against the procps-3.3.17 version of pgrep in Ubuntu 22.04. I don't know if behaviour in newer versions of procps has changed; the source code looks rather different.