Search code examples
awkbusyboxstandards-complianceposix-eremawk

Why do several Linux distros ship mawk by default even though it is not POSIX compliant?


mawk is not POSIX compliant because it does not support POSIX EREs.

To be precise, it does not support named character classes like [[:space:]] within its EREs, which are part of POSIX EREs.

Both GNU awk and BusyBox awk do not seem to have this problem.

I encountered this issue multiple times in my own awk scripts, because I really like [[:space:]] for matching htabs as well as spaces and potentially other locale-specific whitespace with a single character class expression.

So I wonder why several Linux distros chose to ship a non-POSIX-compliant implementation of such a prominent utility by default, even though POSIX-compliant ones are also available?


Solution

  • Looking at http://archive.debian.org, it seems that:

    • mawk appeared around 1997 as 1.3.3
    • busybox appeared around 2002 as 0.60.2
    • busybox finally reached version 1 (1.1.3) in 2006

    I would imagine that mawk is still the default for one main reason:

    1. Inertia. It's been packaged as the default for a long time.

    Note that mawk is POSIX compliant (in a way). From its manpage:

    mawk conforms to the Posix 1003.2 (draft 11.3) definition of the AWK language

    Unfortunately that's not the version you care about...

    Given how hard it is even to get its version updated:

    (both still open, the latter since 2009!!), imagine how hard it would be to get debian to replace it with something else entirely!

    I suspect there is also:

    1. it's really easy to install gawk (or your preferred implementation)