Search code examples
phpphp-8

Unexpected result of greater than or less than comparison on PHP 8


The below returns false on PHP 7 but true on PHP 8. Could someone explain why this is happening?

var_dump("U0M262" > 100000);

Solution

  • There is no obviously correct result for a comparison between a string and a number. In many languages, it would just give an error; in others, including PHP, the language tries to make sense of it by converting both operands to the same type, but this involves a judgement of which type to "prefer".


    Historically, PHP has preferred comparing numbers to comparing strings: it treated "U0M262" > 100000 as (int)"U0M262" > 100000. Since (int)"U0M262" has no obvious value, it is evaluated as 0, and the expression becomes 0 > 100000, which is false.

    As of PHP 8, this behaviour has changed and PHP now only uses a numeric comparison for "numeric strings", e.g. "42" clearly "looks like" 42.

    Since "U0M262" doesn't fit the requirements for a numeric string, "U0M262" > 100000 is now treated as "U0M262" > (string)100000. This does a byte-wise comparison of the sort order for the two strings, and finds that since "U" comes after "1" in ASCII (and any ASCII-derived encoding, including UTF-8), the result is true.


    Because of how ASCII (and compatible encodings such as UTF-8) is arranged:

    • A string starting with a control character or space will be "less than" any number
    • A string starting with a letter will be "more than" any number
    • A string starting with any of "! " # $ % & ' ( ) * + , - . /" will be "less than" any number
    • For a string starting with a digit, you need to look at the individual bytes
    • Any other string will be "more than" any number

    As ever, you can tell PHP which comparison you intended, and get the correct behaviour in all versions, using explicit casts:

    var_dump((int)"U0M262" > (int)100000); // bool(false)
    var_dump((string)"U0M262" > (string)100000); // bool(true)
    

    (Obviously, this makes no sense if you're hard-coding both sides anyway, but assuming one or both is a variable, this is how you'd do it.)