Search code examples
pythonsubprocess

subprocess.call using string vs using list


I am trying to use rsync with subprocess.call. Oddly, it works if I pass subprocess.call a string, but it won't work with a list.

calling sp.call with a string:

In [23]: sp.call("rsync -av content/ writings_raw/", shell=True)
sending incremental file list

sent 6236 bytes  received 22 bytes  12516.00 bytes/sec
total size is 324710  speedup is 51.89
Out[23]: 0

calling sp.call with a list:

In [24]: sp.call(["rsync", "-av", "content/", "writings_raw/"], shell=True)
rsync  version 3.0.9  protocol version 30
Copyright (C) 1996-2011 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
    64-bit files, 64-bit inums, 32-bit timestamps, 64-bit long ints,
    socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
    append, ACLs, xattrs, iconv, symtimes

rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
are welcome to redistribute it under certain conditions.  See the GNU
General Public Licence for details.

rsync is a file transfer program capable of efficient remote update
via a fast differencing algorithm.

Usage: rsync [OPTION]... SRC [SRC]... DEST
  or   rsync [OPTION]... SRC [SRC]... [USER@]HOST:DEST
  or   rsync [OPTION]... SRC [SRC]... [USER@]HOST::DEST
  or   rsync [OPTION]... SRC [SRC]... rsync://[USER@]HOST[:PORT]/DEST
  or   rsync [OPTION]... [USER@]HOST:SRC [DEST]
  or   rsync [OPTION]... [USER@]HOST::SRC [DEST]
  or   rsync [OPTION]... rsync://[USER@]HOST[:PORT]/SRC [DEST]
The ':' usages connect via remote shell, while '::' & 'rsync://' usages connect
to an rsync daemon, and require SRC or DEST to start with a module name.

Options
 -v, --verbose               increase verbosity
 -q, --quiet                 suppress non-error messages
     --no-motd               suppress daemon-mode MOTD (see manpage caveat)
... snipped....
                             repeated: --filter='- .rsync-filter'
     --exclude=PATTERN       exclude files matching PATTERN
     --blocking-io           use blocking I/O for the remote shell
 -4, --ipv4                  prefer IPv4
 -6, --ipv6                  prefer IPv6
     --version               print version number
(-h) --help                  show this help (-h is --help only if used alone)
...snipped ...
rsync error: syntax or usage error (code 1) at main.c(1438) [client=3.0.9]
Out[24]: 1

What is wrong with how I use the list? How would you fix it? I need the list, because I would like to use variables. Of course I could use:

  sp.call("rsync -av "+Orig+" "+Dest, shell=True)    

But I would like to understand how subprocess understands lists vs. strings.

setting shell=False and a list:

In [36]: sp.call(['rsync', '-av', ORIG, DEST], shell=False)
sending incremental file list

sent 6253 bytes  received 23 bytes  12552.00 bytes/sec
total size is 324710  speedup is 51.74
Out[36]: 0

setting shell=False and a string

In [38]: sp.call("rsync -av"+" "+ORIG+" "+DEST, shell=False)
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-38-0d366d3ef8ce> in <module>()
----> 1 sp.call("rsync -av"+" "+ORIG+" "+DEST, shell=False)

/usr/lib/python2.7/subprocess.pyc in call(*popenargs, **kwargs)
    491     retcode = call(["ls", "-l"])
    492     """
--> 493     return Popen(*popenargs, **kwargs).wait()
    494 
    495 

/usr/lib/python2.7/subprocess.pyc in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags)
    677                             p2cread, p2cwrite,
    678                             c2pread, c2pwrite,
--> 679                             errread, errwrite)
    680 
    681         if mswindows:

/usr/lib/python2.7/subprocess.pyc in _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, universal_newlines, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite)
   1257                     if fd is not None:
   1258                         os.close(fd)
-> 1259                 raise child_exception
   1260 
   1261 

OSError: [Errno 2] No such file or directory

Solution

  • subprocess's rules for handling the command argument are actually a bit complex.

    Generally speaking, to run external commands, you should use shell=False and pass the arguments as a sequence. Use shell=True only if you need to use shell built-in commands or specific shell syntax; using shell=True correctly is platform-specific as detailed below.

    From the docs:

    args should be a sequence of program arguments or else a single string. By default, the program to execute is the first item in args if args is a sequence. If args is a string, the interpretation is platform-dependent and described below. See the shell and executable arguments for additional differences from the default behavior. Unless otherwise stated, it is recommended to pass args as a sequence.... If shell is True, it is recommended to pass args as a string rather than as a sequence.

    With shell=False:

    On Unix, if args is a string, the string is interpreted as the name or path of the program to execute. However, this can only be done if not passing arguments to the program.

    On Windows, if args is a sequence, it will be converted to a string in a manner described in Converting an argument sequence to a string on Windows. This is because the underlying CreateProcess() operates on strings.

    With shell=True:

    On Unix with shell=True, the shell defaults to /bin/sh. If args is a string, the string specifies the command to execute through the shell. This means that the string must be formatted exactly as it would be when typed at the shell prompt. This includes, for example, quoting or backslash escaping filenames with spaces in them. If args is a sequence, the first item specifies the command string, and any additional items will be treated as additional arguments to the shell itself.

    On Windows with shell=True, the COMSPEC environment variable specifies the default shell. The only time you need to specify shell=True on Windows is when the command you wish to execute is built into the shell (e.g. dir or copy). You do not need shell=True to run a batch file or console-based executable.

    (all emphasis mine)


    For completeness, here's what happens in each of your four examples on a UNIX system:

    string with shell=True

    subprocess.call("rsync -av a/ b/", shell=True) will invoke sh -c "rsync -av a/ b/", which executes the shell script rsync -av a/ b/; the shell will parse this as a call to rsync with arguments -av, a/, b/, so it works fine.

    Note that if any argument contained a space or special shell character it would need to be manually escaped, making this a fragile approach.

    list with shell=True

    subprocess.call(["rsync", "-av", "a/", "b/"], shell=True) will invoke sh -c "rsync" -av a/ b/, which executes the shell script rsync, setting $0 to -av, $1 to a/, and $2 to b/. This shell script just invokes rsync with no arguments (ignoring $0, $1, $2), which is why you get a screenful of help text.

    One way to make this work would be subprocess.call(['rsync "$@"', "rsync", "-av", "a/", "b/"], shell=True). This will invoke a shell script which passes the arguments through to rsync. Note the dummy extra rsync argument, necessary to set $0 (note that the expansion of $@ starts with $1). This is not an ideal solution, and hence why it's very rare to use a sequence with shell=True.

    string with shell=False

    subprocess.call("rsync -av a/ b/") will attempt to find a binary named rsync -av a/ b/ on your $PATH. Since no such binary exists, you get an error from subprocess. There is no way to provide any arguments to the program when using a string with shell=False.

    list with shell=False

    subprocess.call(["rsync", "-av", "a/", "b/"]) invokes the rsync binary on your $PATH, passing rsync as argv[0], -av as argv[1], a/ as argv[2] and b/ as argv[3]. No escaping of arguments is needed as they are passed straight through to the execve system call.