Search code examples
delphiunicodedelphi-5unicode-stringwidestring

How do i construct a WideString with a diacratic in a non-unicode Delphi version?


i am trying to construct a (test) WideString of:

á (U+00E1 Small Letter Latin A with acute)

but using it's decomposed form:

LATIN SMALL LETTER A (U+0061) COMBINING ACUTE ACCENT (U+0301)

So i have the code fragment:

var
    test: WideString;
begin
   test := #$0061#$0301;
   MessageBoxW(0, PWideChar(test), 'Character with diacratic', MB_ICONINFORMATION or MB_OK);
end;

Except it doesn't appear to work:

enter image description here

This could be a bug in MessageBox, but i'm going to go ahead and say that it's more likely the bug is in my code.

Some other variations i have tried:

test := WideString(#$0061#$0301);


const
    SmallLetterLatinAWithAcuteDecomposed: WideString = #$0061#$0301;
test := SmallLetterLatinAWithAcuteDecomposed


test := #$0061+#$0301;  (Doesn't compile; incompatible types)


test := WideString(#$0061)+WideString(#$0301);  (Doesn't compile; crashes compiler)


test := 'a'+WideString(#$0301);  (Doesn't compile; crashes compiler)


//Arnauld's thought:
test := #$0301#$0061;

Bonus chatter


Solution

  • Best answer:

    const
        n: WideString = '';  //n=Nothing
    
    s := n+#$0061+#$0301;
    

    This fixes all cases i have below that otherwise fail.


    The only variant that works is to declare it as a constant:

    AccentAcute: WideString = #$0301;
    AccentAcute: WideString = WideChar($0301);
    AccentAcute: WideString = WideChar(#$0301);
    AccentAcute: WideString = WideString(#$0301);
    

    Sample Usage:

    s := 'Pasta'+AccentAcute;
    

    Constant based syntaxes that do not work

    • AccentAcute: WideString = $0301;
      incompatible types
    • AccentAcute: WideString = #0301;
      gives enter image description here
    • AccentAcute: WideString = WideString($0301);
      invalid typecast
    • AccentAcute: WideString = WideString(#$0301);
      invalid typecast
    • AccentAcute: WideChar = WideChar(#0301); gives Pastai
    • AccentAcute: WideChar = WideChar($0301); gives Pasta´

    Other syntaxes that fail

    • 'Pasta'+WideChar($0301)
      gives Pasta´
    • 'Pasta'+#$0301
      gives Pasta´
    • WideString('Pasta')+#$0301
      gives enter image description here

    Summary of all constant based syntaxes i found think up:

    AccentAcute: WideString =            #$0301;   //works
    AccentAcute: WideString =   WideChar(#$0301);  //works
    AccentAcute: WideString = WideString(#$0301);  //works
    AccentAcute: WideString =             $0301;   //incompatble types
    AccentAcute: WideString =    WideChar($0301);  //works
    AccentAcute: WideString =  WideString($0301);  //invalid typecast
    
    AccentAcute: WideChar =            #$0301;     //fails, gives Pasta´
    AccentAcute: WideChar =   WideChar(#$0301);    //fails, gives Pasta´
    AccentAcute: WideChar = WideString(#$0301);    //incompatible types
    AccentAcute: WideChar =             $0301;     //incompatible types
    AccentAcute: WideChar =    WideChar($0301);    //fails, gives Pasta´
    AccentAcute: WideChar =  WideString($0301);    //invalid typecast
    

    Rearranging WideChar can work, as long as you only append to a variable

    //Works
    t := '0123401234012340123';
    t := t+WideChar(#$D840);
    t := t+WideChar(#$DC00);
    
    //fails
    t := '0123401234012340123'+WideChar(#$D840);
    t := t+WideChar(#$DC00);
    
    //fails
    t := '0123401234012340123'+WideChar(#$D840)+WideChar(#$DC00);
    
    //works
    t := '0123401234012340123';
    t := t+WideChar(#$D840)+WideChar(#$DC00);
    
    //works
    t := '';
    t := t+WideChar(#$D840)+WideChar(#$DC00);
    
    //fails; gives junk
    t := ''+WideChar(#$D840)+WideChar(#$DC00);
    
    //crashes compiler
    t := WideString('')+WideChar(#$D840)+WideChar(#$DC00);
    
    //doesn't compile
    t := WideChar(#$D840)+WideChar(#$DC00);
    

    Definitely hitting against compiler nonsense; cases that weren't tested tested fully. Yes, i know David, we should upgrade.