Search code examples
delphiutf-8firebirdudf

Delphi Firebird UDF with UTF8 strings


We are trying to write a UDF in Delphi (10 Seattle) for our Firebird 2.5 database which should remove some characters from the input string. All our string fields in the database are using character set UTF8 with collation UNICODE_CI_AI.

The function should remove some characters like space, . ; : / \ and others from the string. Our function works fine for strings containing characters with ascii value <= 127. As soon as there are characters with ascii value bigger than 127, the UDF fails. We have tried using PChar instead of PAnsiChar parameters but without success. For now we do a check if the character has an ascii value above 127 and if so, we remove that character from the string too.

What we want though, is a UDF that returns the original string without the punctuation characters.

This is our code so far:

    unit UDFs;

    interface

    uses ib_util;

    function UDF_RemovePunctuations(InputString: PAnsiChar): PAnsiChar; cdecl;

    implementation

    uses SysUtils, AnsiStrings, Classes;

    //FireBird declaration:
    //DECLARE EXTERNAL FUNCTION UDF_REMOVEPUNCTUATIONS
    //  CSTRING(500)
    //RETURNS CSTRING(500) FREE_IT
    //ENTRY_POINT 'UDF_RemovePunctuations' MODULE_NAME 'FB_UDF.dll';
    function UDF_RemovePunctuations(InputString: PAnsiChar): PAnsiChar;
    const
      PunctuationChars = [' ', ',', '.', ';', '/', '\', '''', '"','(', ')'];
    var
      I: Integer;
      S, NewS: String;
    begin
      S := UTF8ToUnicodeString(InputString);

      For I := 1 to Length(S) do
      begin
        If Not CharInSet(S[I], PunctuationChars)
        then begin
          If S[I] <= #127
          then NewS := NewS + S[I];
        end;
      end;

      Result := ib_util_malloc(Length(NewS) + 1);
      NewS := NewS + #0;
      AnsiStrings.StrPCopy(Result, NewS);
    end;

    end.

When we remove the check on ascii value <= #127 we can see that NewS contains all characters as it should be (without the punctuation characters of course) but things go wrong when doing the StrPCopy we think.

Any help would be appreciated!


Solution

  • Thanks to LU RD I got this working.

    The answer was to declare my string variables as Utf8String instead of String and not converting the inputstring to Unicode.

    I have adapted my code like this:

        //FireBird declaration:
        //DECLARE EXTERNAL FUNCTION UDF_REMOVEPUNCTUATIONS
        //  CSTRING(500)
        //RETURNS CSTRING(500) FREE_IT
        //ENTRY_POINT 'UDF_RemovePunctuations' MODULE_NAME 'CarfacPlus_UDF.dll';
        function UDF_RemovePunctuations(InputString: PAnsiChar): PAnsiChar;
        const
          PunctuationChars = [' ', ',', '.', ';', '/', '\', '''', '"','(', ')', '-',
                              '+', ':', '<', '>', '=', '[', ']', '{', '}'];
        var
          I: Integer;
          S: Utf8String;
        begin
          S := InputString;
    
          For I := Length(S) downto 1 do
            If CharInSet(S[I], PunctuationChars)
            then Delete(S, I, 1);
    
          Result := ib_util_malloc(Length(S) + 1);
          AnsiStrings.StrPCopy(Result, AnsiString(S));
        end;