Search code examples
perlunixiosedstdio

Make sed not buffer by lines


I'm not trying to prevent sed from block-buffering! I am looking to get it to not even line-buffer.

I am not sure if this is even possible at all.

Basically there is a big difference between the behavior of sed and that of cat when interacting with them from a raw pseudo-terminal: cat will immediately spit back the inserted characters when it receives them over STDIN, while sed even in raw mode will not.

A thought experiment could be carried out: given a simple sed command such as s/abc/zzz/g, sending a stream of input to sed like 123ab means that sed at best can provide over standard output the characters 123, because it does not yet know if a c will arrive and cause the result string to be 123zzz, while any other character would have it print exactly what came in (allowing it to "catch up", if you will). So in a way it's obvious why cat does respond immediately; it can afford to.

So of course that's how it would work in an ideal world where sed's authors actually cared about this kind of a use case.

I suspect that that is not the case. In reality, through my not too terribly exhaustive methods, I see that sed will line buffer no matter what (which allows it to always be able to figure out whether to print the 3 z's or not), unless you tell it that you care about matching your regexes past/over newlines, in which case it will just buffer the whole damn thing before providing any output.

My ideal solution is to find a sed that will spit out all the text that it has already finished parsing, without waiting till the end of line to do so. In my little example above, it would instantly spit back the characters 1, 2, and 3, and while a and b are being entered (typed), it says nothing, till either a c is seen (prints zzz), or any other character X is seen, in which case abX is printed, or in the case of EOF ab is printed.

Am I SOL? Should I just incrementally implement my Perl code with the features I want, or is there still some chance that this sort of magically delicious functionality can be got through some kind of configuration?

See another question of mine for more details on why I want this.

So, one potential workaround on this is to manually establish groups of input to "split" across calls to sed (or in my case since i'm already dealing with a Perl script, perl's regex replacement operators) so that I can sort of manually do the flushing. But this cannot achieve the same level of responsiveness because it would require me to think through the expression to describe the points at which the "buffering" is to occur, rather than having a regex parser automatically do it.


Solution

  • There is a tool that matches an input stream against multiple regular expressions in parallel and acts as soon as it decides on a match. It's not sed. It's lex. Or the GNU version, flex.

    To make this demonstration work, I had to define a YY_INPUT macro, because flex was line-buffering input by default. Even with no buffering at the stdio level, and even in "interactive" mode, there is an assumption that you don't want to process less than a line at a time.

    So this is probably not portable to other versions of lex.

    %{
    #include <stdio.h>
    
    #define YY_INPUT(buf,result,max_size) \
       { \
       int c = getchar(); \
       result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
       }
    %}
    
    %%
    
    abc  fputs("zzz", stdout); fflush(stdout);
    .    fputs(yytext, stdout); fflush(stdout);
    
    %%
    
    int main(void)
    {
      setbuf(stdin, 0);
      yylex();
    }
    

    Usage: put that program into a file called abczzz.l and run

    flex --always-interactive -o abczzz.c abczzz.l
    cc abczzz.c -ll -o abczzz
    for ch in a b c 1 2 3 ; do echo -n $ch ; sleep 1 ; done | ./abczzz ; echo