Search code examples
phputf-8quotes

Are there different types of double quotes in utf-8 (PHP, str_replace)?


In PHP 5.3, am trying to replace double quotes in a string as such:

$bar = str_replace('"','\'',$foo);

But some quotes that are saved in the utf8-Database are not being replaced, although they look perfectly normal:

"Some text"

Are there different character types I have to search for? If so, which are they?


Solution

  • There are many characters that look like quotation marks, most of them are used infrequently. The ones that are used more often are these three:

    "   U+0022 QUOTATION MARK
    “   U+201C LEFT DOUBLE QUOTATION MARK
    ”   U+201D RIGHT DOUBLE QUOTATION MARK
    

    Some rarer ones are FULLWIDTH QUOTATION MARK, the DITTO MARK, the DOUBLE PRIME, the DOUBLE PRIME QUOTATION MARK, and so on. The Unicode.org "confusables" tool finds 15 characters similar to ".

    Why don't you copy and paste the offending character here so we can identify it? Or you could use the HEX function to get the hexadecimal encoding of the character, that's another way of identifying it.

    Update The unicode.org confusables utility seems to be down, but the data is available as a text file. The current list of characters that are "confusable" with double quote are:

    1CD3 ;  0027 0027 ; MA  #* ( ᳓ → '' ) VEDIC SIGN NIHSHVASA → APOSTROPHE, APOSTROPHE # →″→→"→
    0022 ;  0027 0027 ; MA  #* ( " → '' ) QUOTATION MARK → APOSTROPHE, APOSTROPHE   # 
    FF02 ;  0027 0027 ; MA  #* ( " → '' ) FULLWIDTH QUOTATION MARK → APOSTROPHE, APOSTROPHE # →”→→"→
    201C ;  0027 0027 ; MA  #* ( “ → '' ) LEFT DOUBLE QUOTATION MARK → APOSTROPHE, APOSTROPHE   # →"→
    201D ;  0027 0027 ; MA  #* ( ” → '' ) RIGHT DOUBLE QUOTATION MARK → APOSTROPHE, APOSTROPHE  # →"→
    201F ;  0027 0027 ; MA  #* ( ‟ → '' ) DOUBLE HIGH-REVERSED-9 QUOTATION MARK → APOSTROPHE, APOSTROPHE    # →“→→"→
    2033 ;  0027 0027 ; MA  #* ( ″ → '' ) DOUBLE PRIME → APOSTROPHE, APOSTROPHE # →"→
    2036 ;  0027 0027 ; MA  #* ( ‶ → '' ) REVERSED DOUBLE PRIME → APOSTROPHE, APOSTROPHE    # →‵‵→
    3003 ;  0027 0027 ; MA  #* ( 〃 → '' ) DITTO MARK → APOSTROPHE, APOSTROPHE   # →″→→"→
    05F4 ;  0027 0027 ; MA  #* ( ‎״‎ → '' ) HEBREW PUNCTUATION GERSHAYIM → APOSTROPHE, APOSTROPHE   # →"→
    02DD ;  0027 0027 ; MA  #* ( ˝ → '' ) DOUBLE ACUTE ACCENT → APOSTROPHE, APOSTROPHE  # →"→
    02BA ;  0027 0027 ; MA  # ( ʺ → '' ) MODIFIER LETTER DOUBLE PRIME → APOSTROPHE, APOSTROPHE  # →"→
    02F6 ;  0027 0027 ; MA  #* ( ˶ → '' ) MODIFIER LETTER MIDDLE DOUBLE ACUTE ACCENT → APOSTROPHE, APOSTROPHE   # →˝→→"→
    02EE ;  0027 0027 ; MA  # ( ˮ → '' ) MODIFIER LETTER DOUBLE APOSTROPHE → APOSTROPHE, APOSTROPHE # →″→→"→
    05F2 ;  0027 0027 ; MA  # ( ‎ײ‎ → '' ) HEBREW LIGATURE YIDDISH DOUBLE YOD → APOSTROPHE, APOSTROPHE  # →‎יי‎→