python security password-protection pid shlex

Password Management for non-interactive process

The challenge

I need a password management tool that will be invoked by other processes (scripts of all sort: python, php, perl, etc) and it will be able to identify and verify the caller script in order to perform access control: either return a password back or exit -1

The current implementation

After looking into various frameworks, I have decided to use python's keepassdb which is able to handle Keepass V1.X backend database files and build my own access control overlay (since this can later be customized and integrated to our LDAP for user/group access). Access control is done via overloading the notes field of each entry to include a list of SHA-256 hashes that are allowed to access the password. (Note that this also validates that the script is not changed by anyone)

The password manager is called with -p parameter which is the PID of the callee script/application and will do the following steps:

Look recursively "up" starting from its own PID and looking for parents. The caller PID has to be found before we reach process 1 which is init with parent 0. This way we are sure we know who called this password manager instance.
Get the full command line of that (parent) process and analyse it looking for scripting languages including python, perl, php, bash, bat, groovy, etc (shlex is used for this)
Figure out the absolute path of the script and calculate its SHA
Compare this to the database values and see if it exists, if it does the script is allowed to have the password which is returned in stdout in a standard format. If not, exit with -1.

The problem

The above implementation works nicely for legit scripts but it is very easy to confuse it. Let caller.py be a script that is allowed access to a specific entry e. Running it the command line looks like python /path/to/caller.py arg1 arg2. The code that parses the command line is:

cmd = walk_ppids(pid)
lg.debug(cmd)
if cmd is False:
    lg.error("PID %s is not my parent process or not a process at all!" % pid)
    sys.exit(-1)

cmd_parts = shlex.split(cmd)
running_script = ""
for p in cmd_parts:
    if re.search("\.(php|py|pl|sh|groovy|bat)$", p, re.I):
        running_script = p
        break

if not running_script:
    lg.error("Cannot identify this script's name/path!")
    sys.exit(-1)

running_script = os.path.abspath(running_script)
lg.debug("Found "+running_script)

phash = hash_file(open(running_script, 'rb'), hashlib.sha256())

The command line of the parent process is acquired using:

os.popen("ps -p %s -o args=" % ppid).read().strip()

Now, the easiest way to confuse the above function is to create a shell script without the .sh extension that takes as first argument the caller.py. The sh does not use its arguments, instead it invokes the password manager querying for the entry e. The command line would look like fake_sh ./caller.py and thus the above code returns the pass... which is the wrong thing to do.

The Questions

One would assume that this is a common problem solved long time ago without programmers hard-coding passes into scripts/apps but I did a bit of research for few days and I didn't seem to able to find anything that works in similar way. I understand that this question is more open-ended so I will accept answers to the following:

Am I re-inventing the wheel? Is there a framework/software that will do something similar?
Is this the correct approach, relying on PIDs? Is there another way?
Implementation wise, could the code posted be improved to be more robust and not that easily confused? (shlex analysis part)

Solution

Improvement: Making the rules more strict

The first step was to confirm that the correct extension runs on the correct interpreter which means that caller.py cannot run on /bin/bash.

Similar vulnerabilities can be exploited with python, for example the command python -W ./caller.py ./myUberHack.py. A command line analyzer that looks for the 1st .py argument to the interpreter will think that caller.py is running... which is not.

Building all the invocation rules for all interpreters would be too time consuming, so I hard-code the assumptions. These are store in a tuple and each line is:

(file extension, positional argument, interpreter first letters)

exts = (
    (".py", 1, "python"), 
    (".php", 2, "php"),
    (".pl", 1, "perl"),
    (".sh", 1, "/bin/bash"), # Assumption, we accept only bash 
    (".groovy", 1, "groovy"),
    (".rb", 1, "ruby"),
)
"""Matching extensions to positional arguments and interpreters"""

And the validation code now is:

for i in exts:
    # Check the specified cmdline position and extension
    if cmd_parts[i[1]].strip().endswith(i[0]):
        lg.debug("Checking "+cmd_parts[i[1]])
        running_script = cmd_parts[i[1]]

        # Make sure that the interpretter matches the extension
        if running_script.endswith(i[0]) and not cmd_parts[0].startswith(i[2]):
            lg.error("Wrong interpretter... go away...")
            sys.exit(-1)

        break

Can't think of anything better at the moment...