OS: Ubunty 20.4, Centos 8, macOS Catalina 10.15.7
Language: C, C++
Compiler: gcc (most recent versions for each OS)
I am using wordexp Posix library function to get shell-like expansion of strings.
The expansion works fine with one exception: when I set $IFS environment variable to something other than whitespace, for example ':', it does not seem to affect splitting of the words that continues to be done on whitespace only regardless of the IFS value.
Man page for wordexp for Linux https://man7.org/linux/man-pages/man3/wordexp.3.html states:
This is why I expected wordexp to behave the same way as bash in this respect.
On all the listed OSes I got the same exactly correct and expected result when changing the character set used for splitting:
Using default (IFS is not set)
read -a words <<<"1 2:3 4:5"
for word in "${words[@]}"; do echo "$word"; done
correctly splits on space and produces the result:
1
2:3
4:5
while setting IFS to ':'
IFS=':' read -a words <<<"1 2:3 4:5"
for word in "${words[@]}"; do echo "$word"; done
correctly splits on ':' and produces the result:
1 2
3 4
5
But running the code below yields the same result regardless whether IFS environment variable is set or not:
C Code:
#include <stdio.h>
#include <wordexp.h>
#include <stdlib.h>
static void expand(char const *title, char const *str)
{
printf("%s input: %s\n", title, str);
wordexp_t exp;
int rcode = 0;
if ((rcode = wordexp(str, &exp, WRDE_NOCMD)) == 0) {
printf("output:\n");
for (size_t i = 0; i < exp.we_wordc; i++)
printf("%s\n", exp.we_wordv[i]);
wordfree(&exp);
} else {
printf("expand failed %d\n", rcode);
}
}
int main()
{
char const *str = "1 2:3 4:5";
expand("No IFS", str);
int rcode = setenv("IFS", ":", 1);
if ( rcode != 0 ) {
perror("setenv IFS failed: ");
return 1;
}
expand("IFS=':'", str);
return 0;
}
The result in all OSes is the same:
No IFS input: 1 2:3 4:5
output:
1
2:3
4:5
IFS=':' input: 1 2:3 4:5
output:
1
2:3
4:5
As a note, the snippet above was created for this post - I did test with a more complex code that verified that the environment variable was indeed set properly.
I looked at the source code for the wordexp function implementation available at https://code.woboq.org/userspace/glibc/posix/wordexp.c.html and it appears that it does use $IFS but perhaps inconsistently or maybe this is a bug.
Specifically:
In the body of wordexp that starts on line 2229 it does get IFS environment variable value and processes it:
lines 2273 - 2276:
/* Find out what the field separators are.
* There are two types: whitespace and non-whitespace.
*/
ifs = getenv ("IFS");
But then later on in the function it does not seem to
use the $IFS values for words separation.
This looks like a bug unless "field separators" on line 2273
and "word separator" on line 2396 mean different things.
lines 2395 - 2398:
default:
/* Is it a word separator? */
if (strchr (" \t", words[words_offset]) == NULL)
{
But in any case the code seem to only use space or tab as a splitter unlike bash that respects the IFS set splitter values.
Many thanks in advance for all your comments and insights!
In the accepted answer there was a hint on how to achieve the split on non-whitespace characters from the $IFS: you have to set $IFS and put the string that you want to split as a value for a temporary environmental variable and then call wordexp against that temporary variable. This is demonstrated in the updated code below.
While this behavior that is visible in the source code may not be actually a bug it definitely looks like a questionable design decision to me…
Updated code:
#include <stdio.h>
#include <wordexp.h>
#include <stdlib.h>
static void expand(char const *title, char const *str)
{
printf("%s input: %s\n", title, str);
wordexp_t exp;
int rcode = 0;
if ((rcode = wordexp(str, &exp, WRDE_NOCMD)) == 0) {
printf("output:\n");
for (size_t i = 0; i < exp.we_wordc; i++)
printf("%s\n", exp.we_wordv[i]);
wordfree(&exp);
} else {
printf("expand failed %d\n", rcode);
}
}
int main()
{
char const *str = "1 2:3 4:5";
expand("No IFS", str);
int rcode = setenv("IFS", ":", 1);
if ( rcode != 0 ) {
perror("setenv IFS failed: ");
return 1;
}
expand("IFS=':'", str);
rcode = setenv("FAKE", str, 1);
if ( rcode != 0 ) {
perror("setenv FAKE failed: ");
return 2;
}
expand("FAKE", "${FAKE}");
return 0;
}
which produces the result:
No IFS input: 1 2:3 4:5
output:
1
2:3
4:5
IFS=':' input: 1 2:3 4:5
output:
1
2:3
4:5
FAKE input: ${FAKE}
output:
1 2
3 4
5
You're comparing apples to oranges. wordexp()
splits a string up into individual tokens the same way the shell does. The shell builtin read
doesn't follow the same algorithm; it just does word splitting. You should be comparing wordexp()
to how the arguments to a script or shell function are parsed:
#!/bin/sh
printwords() {
for arg in "$@"; do
printf "%s\n" "$arg"
done
}
echo "No IFS input: 1 2:3 4:5"
printwords 1 2:3 4:5
echo "IFS=':' input: 1 2:3 4:5"
IFS=:
printwords 1 2:3 4:5
This produces
No IFS input: 1 2:3 4:5
1
2:3
4:5
IFS=':' input: 1 2:3 4:5
1
2:3
4:5
just like the C program.
Now, for the interesting bit. I couldn't find it explicitly mentioned as such in the POSIX documentation with a quick scan, but the bash
manual has this to say about word splitting:
Note that if no expansion occurs, no splitting is performed.
Let's try a version that does parameter expansion in its arguments:
#!/bin/sh
printwords() {
for arg in "$@"; do
printf "%s\n" "$arg"
done
}
foo=2:3
printf "foo = %s\n" "$foo"
printf "No IFS input: 1 \$foo 4:5\n"
printwords 1 $foo 4:5
printf "IFS=':' input: 1 \$foo 4:5\n"
IFS=:
printwords 1 $foo 4:5
which when run via shells like dash
, ksh93
or bash
(But not zsh
unless you turn on the SH_WORD_SPLIT
option), produces
foo = 2:3
No IFS input: 1 $foo 4:5
1
2:3
4:5
IFS=':' input: 1 $foo 4:5
1
2
3
4:5
As you can see, the argument that has a parameter was subject to field splitting, but not the literal one. Making the same change to the string in your C program and running foo=2:3 ./wordexp
prints out the same thing.