multithreading haskell concurrency functional-programming pipe

What determines the the lifetime of the process spawned via System.Process.createProcess?

tl;dr

If I run in the shell either this

$ grep hello

or this

$ socat tcp-listen:12345,fork -

the result is the same, i.e. the program blocks, waiting for standard input.

Why, wrapping those two processes in the following Haskell programs (which differ only in the arguments to passed to proc) produces different results?

Indeed, when this one

module Main where
import System.Process
main :: IO ()
main = do
  _ <- createProcess (proc "grep" ["hello"])
                {std_in = CreatePipe, std_out = CreatePipe}
  return ()

returns, it doesn't leave a grep process running, at least based on pidof grep, wheras this other one

module Main where
import System.Process
main :: IO ()
main = do
  _ <- createProcess (proc "socat" ["tcp-listen:12345,fork", "-"])
                {std_in = CreatePipe, std_out = CreatePipe}
  return ()

seems to leave socat running, according to pidof.

Some context

I am experimenting with System.Process and Control.Concurrent.Async for the purpose of spawning an external process, as well as threads to control its standard input and output.

However, while playing around with those libraries, I've managed to create a situation where 2 instances of my Haskell program communicate with each other (via the external socat process), the threads of each program communicate via an MVar to decide what to do, the two programs exit with 0, and yet... one of the two external socat processes spawned via System.Process.createProcess is left running, so I have to manually kill it.

Sure, maybe the issue is with the program I wrote beside the usage of createProcess, but to start with, I want to make sure I understand how createProgram syhould be used.

So the question is: once I execute something like this

import System.Process
main :: IO ()
main = do
  args <- getArgs
  (Just i, Just o, Nothing, h) <- createProcess (proc "socat" args)
                                                {std_in = CreatePipe, std_out = CreatePipe}
  -- rest

who or what is responsible for putting down the process spawned by createProcess?

After all, -- rest executes immediately after the call to createProcess, so while -- rest executes, the shell program, in this case socat is running on its own. Is it up to me to guarantee correct lifetime management? Should I make use of h for this purpose?

One experiment is this:

import System.Process
main :: IO ()
main = do
  _ <- createProcess (proc "cat" ["/dev/zero"])
              {std_in = CreatePipe, std_out = CreatePipe}
  return ()

Since cat /dev/zero never returns (I've tried in the Bash shell), shouldn't this program terminate leaving cat running? I.e., after the Haskell program terminates successfully, shouldn't pidof cat return some PID?

I've tried, and it doesn't, making me think that something is cleaning up.

Solution

The reason why grep blocks instead of terminating in a terminal is that it stays open as long as its stdin handle stays open, which is forever in a terminal. In the case of Haskell, the stdin handle you provide to grep using CreatePipe will close as soon as the Haskell program terminates, which will in turn make grep close.

On the other hand, socat doesn't care about its stdin handle and will stay running for as long as its network handle stays open, which is why it stays open forever in both cases.

You can test this difference in behavior in a terminal by ensuring that the stdin gets closed, e.g. like this

$ echo | grep hello
<immediately exits>
$ echo | socat tcp-listen:12345,fork -
<runs forever>

However, you can configure socat to care about handles being closed:

$ echo | socat tcp-listen:12345,fork -,end-close
<doesn't close immediately, but does close as soon as someone connects>

In the case of cat, it cares about both its stdin and stdout handles and will keep running until either stdin has closed and it has pushed all input from stdin to stdout or stdout has been closed, whichever happens first. In your haskell example, the stdin handle (/dev/zero) will stay open forever, but the stdout handle (from CreatePipe) will be closed as soon as the haskell process closes.

You can test this behavior by using less and pressing q to exit early:

cat < /dev/urandom | hexdump -C | less

will terminate cat despite the input stream being infinite.

Now finally, for your main question: Yes, you are responsible for the lifetime of the processes you spawn. Haskell won't automatically shut them down for you, it will only close the handles it created, which some processes will interpret as a sign to shut down and others will not.

And yes, the ProcessHandle h is indeed the way you can force kill the process, either using terminateProcess or cleanupProcess. There is no built in way to gracefully terminate the process, see this issue, other than indirectly by closing the handles for processes that support that. You can however send a signal using the unix package.

One standard way of handling cleanup steps like this is using bracket which will ensure that the cleanup step happens both on normal exit and in case of exceptions.

import System.Process
import Control.Exception (bracket)

-- | Create a process and force terminate it and close its handles at the end of the local scope
withCreateProcess :: CreateProcess -> ((Maybe Handle, Maybe Handle, Maybe Handle, ProcessHandle) -> IO a) -> IO a
withCreateProcess args = bracket (createProcess args) cleanupProcess