Search code examples
phputf-8whitespacespace

Simplest way to get a complete list of all the UTF-8 whitespace characters in PHP


In PHP, what's the most elegant way to get the complete list (array of strings) of all the Unicode whitespace characters, encoded in utf8?

I need that to generate test data.


Solution

  • This email (archived here) contains a list of all Unicode whitespace characters encoded in UTF-8, UTF-16, and HTML.

    In the archived link look for the 'utf8_whitespace_table' function.

    static $whitespace = array(
        "SPACE" => "\x20",
        "NO-BREAK SPACE" => "\xc2\xa0",
        "OGHAM SPACE MARK" => "\xe1\x9a\x80",
        "EN QUAD" => "\xe2\x80\x80",
        "EM QUAD" => "\xe2\x80\x81",
        "EN SPACE" => "\xe2\x80\x82",
        "EM SPACE" => "\xe2\x80\x83",
        "THREE-PER-EM SPACE" => "\xe2\x80\x84",
        "FOUR-PER-EM SPACE" => "\xe2\x80\x85",
        "SIX-PER-EM SPACE" => "\xe2\x80\x86",
        "FIGURE SPACE" => "\xe2\x80\x87",
        "PUNCTUATION SPACE" => "\xe2\x80\x88",
        "THIN SPACE" => "\xe2\x80\x89",
        "HAIR SPACE" => "\xe2\x80\x8a",
        "ZERO WIDTH SPACE" => "\xe2\x80\x8b",
        "NARROW NO-BREAK SPACE" => "\xe2\x80\xaf",
        "MEDIUM MATHEMATICAL SPACE" => "\xe2\x81\x9f",
        "IDEOGRAPHIC SPACE" => "\xe3\x80\x80",
    );