Search code examples
shellscriptinglocale

Consistent implementation of tr?


I have a ksh script that generates a long, random string using /dev/urandom and tr:

STRING="$(cat /dev/urandom|tr -dc 'a-zA-Z0-9-_'|fold -w 64 |head -1)"

On the Linux and AIX servers where I used this it resulted in 64 characters of upper and lower case alpha chars, digits, dash and underscore characters. Example:

W-uch3_4fbnk34u2nc08w_nj23n089023ncNjxz979823n23-n88h30pmLCxkMKj

When I used the script on Solaris the ranges were interpreted as literals and it resulted in strings from the set aAzZ09-_. Example:

AA0z9_aZ-a-z00aZ9_azAZa0zZza9-Az0-_za-9aa0az_a0z-0a0z000-A9Z_0a

Oddly, on this Solaris server the man page for tr indicates that the syntax used should have produced the desired result.

The idea is to use /dev/urandom to produce a pseudo-random string from which we extract characters so that the result a) does not contain spaces and b) does not contain shell special characters. The string will be used on the command line as an argument later on in the script. We don't want to use classes like :alnum: because locale can convert these into multi-byte values that don't work on the command line. This ksh one-liner did the trick perfectly on a great many installations until we got to Solaris.

We have temporarily converted this to a somewhat nasty Perl regex. Is there a syntax for tr or some other utility or ksh built-in that will perform this task consistently across UNIX variants and is universally installed? Doesn't have to be a one-liner but simplicity is appreciated.

Update: We tried the Locale settings with no luck. Waiting on results of using xpg6 version.

$ uname -a
SunOS hostname 5.10 Generic_142900-04 sun4u sparc SUNW,SPARC-Enterprise
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
0-a9-z9a_zzZAa_a_0az-9_z0a_90Z_9az09aZzZAa-9aa_-__za0ZA9_ZzzZazA
$ set | grep '^L[AC]'
LANG=C
LC_ALL=C
LC_COLLATE=en_US
LC_CTYPE=en_US
LC_MESSAGES=en_US
LC_MONETARY=en_US
LC_NUMERIC=en_US
LC_TIME=en_US
$ export LC_CTYPE="$LC_ALL" LC_MESSAGES="$LC_ALL"
$ set | grep '^L[AC]'
LANG=C
LC_ALL=C
LC_COLLATE=en_US
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=en_US
LC_NUMERIC=en_US
LC_TIME=en_US
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
0900z9az99_a0za09__0zA0_Z--Z_-Aa-AaA9zAZz-Aa90A00z__ZzA9A-Z0aA_-
$ unset LC_ALL; export LC_COLLATE=C LC_NUMERIC=C LC_TIME=C
$ set | grep '^L[AC]'
LANG=C
LC_COLLATE=C
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=en_US
LC_NUMERIC=C
LC_TIME=C
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
_AA9aA_Za-A0-AZa_A-0ZA--a_za-a9zZZz__a0az_-0A-9-0aA-0za00A-__9-0
$ unset LANG LC_COLLATE LC_NUMERIC LC_TIME
$ set | grep '^L[AC]'
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=en_US
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
_-_9zz9Z-Z-Z-Z_0_a9zzzZZaAa--9_zAZaaAZz-ZaAZ09Z-_z-zz09ZZAzAz0Z0
$ unset LC_CTYPE LC_MESSAGES LC_MONETARY
$ set | grep '^L[AC]'
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
_0aAa9_Z_a_Z--_Az-aa0ZA0ZzZ-9Aa9-Z0--0A_Z0Zaz-AA_Zz0z---Z_99z_a9
$ export LANG=C LC_ALL=C LC_COLLATE=C LC_CTYPE=C LC_MESSAGES=C LC_MONETARY=C LC_NUMERIC=C LC_TIME=C
$ set | grep '^L[AC]'
LANG=C
LC_ALL=C
LC_COLLATE=C
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=C
LC_NUMERIC=C
LC_TIME=C
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
Za_000z9aa--aA00zAAZza0AA90090--z0a00_zZ9ZA0_---aZZ09a0ZA0_0zZaa
$ cat /dev/urandom | tr -dc "[a-z][A-Z][0-9]-_" | fold -w 64 | head -1 | sed 's/^-/_/'
x7dni9gIXVF6AHQc3B-H6hjnBVHChJ9zM-z5EQ5UEruATI_NNFaCoVLOqM6gVaT5
$

Of course, on Linux that last version spits out square brackets.


Solution

  • If you set your path to /usr/xpg6/bin/ then it'll work as expected The locale seems to have no affect here. A cross platform hack is:

    tr -dc '[a-z][A-Z][0-9]_-' < /dev/urandom | tr -d '][' | fold -w64 | head -n1