Search code examples
bashpipeprocess-substitution

What is the difference between using process substitution vs. a pipe?


I came across an example for the using tee utility in the tee info page:

wget -O - http://example.com/dvd.iso | tee >(sha1sum > dvd.sha1) > dvd.iso

I looked up the >(...) syntax and found something called "process substitution". From what I understand, it makes a process look like a file that another process could write/append its output to. (Please correct me if I'm wrong on that point.)

How is this different from a pipe? (|) I see a pipe is being used in the above example—is it just a precedence issue? or is there some other difference?


Solution

  • There's no benefit here, as the line could equally well have been written like this:

    wget -O - http://example.com/dvd.iso | tee dvd.iso | sha1sum > dvd.sha1
    

    The differences start to appear when you need to pipe to/from multiple programs, because these can't be expressed purely with |. Feel free to try:

    # Calculate 2+ checksums while also writing the file
    wget -O - http://example.com/dvd.iso | tee >(sha1sum > dvd.sha1) >(md5sum > dvd.md5) > dvd.iso
    
    # Accept input from two 'sort' processes at the same time
    comm -12 <(sort file1) <(sort file2)
    

    They're also useful in certain cases where you for any reason can't or don't want to use pipelines:

    # Start logging all error messages to file as well as disk
    # Pipes don't work because bash doesn't support it in this context
    exec 2> >(tee log.txt)
    ls doesntexist
    
    # Sum a column of numbers
    # Pipes don't work because they create a subshell
    sum=0
    while IFS= read -r num; do (( sum+=num )); done < <(curl http://example.com/list.txt)
    echo "$sum"
    
    # apt-get something with a generated config file
    # Pipes don't work because we want stdin available for user input
    apt-get install -c <(sed -e "s/%USER%/$USER/g" template.conf) mysql-server