Why won't my program accept the piped output of another program properly?

I have a C program compiled with 3 .c files. Essentially, that program prints out squares to the standard output based on x and y size input which I have defined in the main. The relevant code is below:

void    rush(int x, int y);

int     main(void)
{
    rush(3, 3);
    return (0);
}

running the executable of the main like so:

./a.out

gives the following:

o-o
| |
o-o

and changing the parameters passed to the rush function to (5, 5) yields the following:

o---o
|   |
|   |
|   |
o---o

You get the idea. Each line is delimited by a \n which allows the function to print the proper next line. I have another test program which is a simple compiled main that simply prints the the value of ARGC as I wanted to test the behavior of what piping such an input would give. The second main program is like so:

#include <stdio.h>

int     main(int argc, char **argv)
{
    printf("argc value is: %d\n", argc);
    return (0);
}

Running the following commands:

./a.out | ./test

I get the following output:

argc value is: 1

Which didn't make sense to me initially, but then I remembered it was because some commands require xargs to accept input properly from stdin. Using xargs with (5, 5) as input in the main:

./a.out | xargs ./test

resulted in:

argc value is: 9

Thus I have two questions. Is there a way to do this without needing xargs and can be done in the c files themselves? And knowing the input to the test file, why is argc == 9? How does the program separate out a string in that format and decide what to put in the array?

Solution

This will be long, so grab your favourite drink. Don't just skip to the answers after the break.

First, examine the command-line arguments supplied to a program, say args.c:

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
    int  i;
    printf("argc = %d\n", argc);
    for (i = 0; i < argc; i++)
        printf("argv[%d] = \"%s\"\n", i, argv[i]);
    return EXIT_SUCCESS;
}

Compile that using your favourite C compiler; I use gcc:

gcc -Wall -O2 args.c -o args

If you run say

./args one two

it will output

argc = 3
argv[0] = "./args"
argv[1] = "one"
argv[2] = "two"

All Unixes have a command line utility or shell built-in printf that works much like the C printf() standard library function does. We can run for example

printf 'Hello, world!\nSecond line\nThird line\n'

and we'll see

Hello, world!
Second line
Third line

Now, if we connect the two with a pipe,

printf 'Hello, world!\nSecond line\nThird line\n' | ./args

we get

argc = 1
argv[0] = "./args"

because there were no parameters to ./args, and the above args.c ignores standard input completely.

The xargs utility command reads the input to it, and then executes its own command-line arguments as a command, adding the input it reads as additional parameters. It is highly configurable, too. If you run

printf 'Hello, world!\nSecond line\nThird line\n' | xargs ./args

you'll get

argc = 7
argv[0] = "./args"
argv[1] = "Hello,"
argv[2] = "world!"
argv[3] = "Second"
argv[4] = "line"
argv[5] = "Third"
argv[6] = "line"

because xargs turns each token in the input, separated by whitespace, into a command line argument. If we tell xargs to turn each input line to a separate argument, by using the -d SEPARATOR option, with newline as the separator:

printf 'Hello, world!\nSecond line\nThird line\n' | xargs -d '\n' ./args

we get

argc = 4
argv[0] = "./args"
argv[1] = "Hello, world!"
argv[2] = "Second line"
argv[3] = "Third line"

If we tell xargs to add at most two arguments per command executed, by adding the -n 2 option,

printf 'Hello, world!\nSecond line\nThird line\n' | xargs -d '\n' -n 2 ./args

we'll get

argc = 3
argv[0] = "./args"
argv[1] = "Hello, world!"
argv[2] = "Second line"
argc = 2
argv[0] = "./args"
argv[1] = "Third line"

This output means that our ./args got actually executed twice. First was effectively ./args 'Hello, world!' 'Second line', and the second was ./args 'Third line'.

Another important option to xargs is -r, which tells it to not run the command without any additional arguments:

true | xargs -r ./args

does not output anything, because xargs sees no input, and the -r option tells it to not run our args program if there are no additional arguments.

When manipulating file names or paths, the -0 (dash zero) option tells xargs that the input separator is the nul character, \0, which in C delimits strings. If we use that in the input to xargs, even strings with newlines and such will be correctly split into arguments. For example:

printf 'One thing\non two lines\0Second thing' | xargs -0 ./args

will output

argc = 3
argv[0] = "./args"
argv[1] = "One thing
on two lines"
argv[2] = "Second thing"

which is exactly what one would want, if processing file names or paths in a robust manner.

Is there a way to do this without needing xargs and can be done in the c files themselves?

Of course: just read standard input. xargs is almost certainly written in C itself on all Unixy systems.

How does [xargs] separate out a string in that format and decide what to put in the array?

The short answer is that it depends on the options used, because xargs is a pretty powerful little tool.

The full answer is, look at the sources. The source for the GNU xargs (part of findutils) is here, and the source for FreeBSD version is here.

The code answer depends on whether you can use POSIX.1 or not, specifically getline() or getdelim(). If you have a single-character separator (be it any single-byte character at all, even nul), you can use getdelim() to reach each "parameter" from the input as a separate string. This is what I'd do, but it is not unix, it is a posix solution. (Nowadays, if you have a maintained Unixy computer, it is almost certain to have POSIX.1 support in its C library built-in.)

Why is argc == 9?

If we duplicate your input using printf 'o---o\n| |\n| |\n| |\no---o\n' and pipe it to xargs ./args, the output is as expected,

argc = 9
argv[0] = "./args"
argv[1] = "o---o"
argv[2] = "|"
argv[3] = "|"
argv[4] = "|"
argv[5] = "|"
argv[6] = "|"
argv[7] = "|"
argv[8] = "o---o"

i.e. each part of your ascii art separated at whitespace, and supplied as a command-line parameter. If we pipe it to xargs -d '\n' ./args, the output is

argc = 6
argv[0] = "./args"
argv[1] = "o---o"
argv[2] = "|   |"
argv[3] = "|   |"
argv[4] = "|   |"
argv[5] = "o---o"

If you had written that initial args.c program for yourself, you probably could have found the answer to your questions yourself via exploration. That is what makes programming so powerful: you can write tools that help you understand the problems you wish to solve. Applying the Unix philosophy and the KISS principle means those tools are often quite simple to write as well. Just write them well in the first place, so you can trust their results, and don't need to rewrite them too often.