Search code examples
encodingconsolepascalfreepascal

Removing char from string in pascal cause question marks in console pascal


I am trying write simple program that will remove all 'o' letters from the string. Example :

I love cats 

Output:

I lve cats

I wrote following code :

var 
    x:integer;
    text:string;
    text_no_o:string;
begin
text:='I love cats';
   for x := 0  to  Length(text) do
   //writeln(Ord(text[6]));
   if(Ord(text[x])=111) then    
   else
    text_no_o[x]:=text[x];
    write(text_no_o);
   end.
   begin
   end;
end.

When text is in English program works fine . But if i change it to Russian . It returns we question marks in console. Code with small modifications for Russian language.


var 
    x:integer;
    text:string;
    text_no_o:string;
begin
text:='Русский язык мой родной';
   for x := 0  to  Length(text) do
   //writeln(Ord(text[6]));
   if(Ord(text[x])=190) then    
   else
    text_no_o[x]:=text[x];
    write(text_no_o);
   end.
   begin
   end;
end.

And result in console that i receive is :

Русский язык м�й р�дн�й

I expect receive

Русский язык мй рднй

As I got the problem can be caused incorrect encoding settings in console, so i should force pascal to use CP1252 instead ANSI .

I am using Free Pascal Compiler version 3.2.0+dfsg-12 for Linux . P.S I am not allowed to use StringReplace or Pos


Solution

  • The string is likely to be UTF8 encoded. So the cyrillic o is encoded as two chars $d0 $be. Here you replace one $be (=190). You need to replace both chars, though you cannot just test for the value of the char, because their meaning depends of surrounding chars.

    Here is a way, remembering the current state (outside of letter or after first byte)

    var
      c: char;
      text: string;
      state: (sOutside, sAfterD0);
    begin
      text:= 'Русский язык мой родной';
      state:= sOutside;
      for c in text do
      begin
        if state = sOutside then
        begin
          if c = #$D0 then // may be the start of the letter
            state := sAfterD0
          else
            write(c); // output this char because not part of letter
        end
        else if state = sAfterD0 then
        begin
          if c = #$BE then state := sOutside // finished skipping
          else
          begin
            // chars do not form letter so output skipped char
            write(#$D0, c);
            state := sOutside;
          end;
        end
      end;
      writeln;
    end.