Search code examples
delphilazarusfreepascal

Is there a simplistic way to extract numbers from a string following certain rules?


I need to pull numbers from a string and put them into a list, there are some rules to this however such as identifying if the extracted number is a Integer or Float.

The task sounds simple enough but I am finding myself more and more confused as time goes by and could really do with some guidance.


Take the following test string as an example:

There are test values: P7 45.826.53.91.7, .5, 66.. 4 and 5.40.3.

The rules to follow when parsing the string are as follows:

  • numbers cannot be preceeded by a letter.

  • If it finds a number and is not followed by a decimal point then the number is as an Integer.

  • If it finds a number and is followed by a decimal point then the number is a float, eg 5.

  • ~ If more numbers follow the decimal point then the number is still a float, eg 5.40

  • ~ A further found decimal point should then break up the number, eg 5.40.3 becomes (5.40 Float) and (3 Float)

  • In the event of a letter for example following a decimal point, eg 3.H then still add 3. as a Float to the list (even if technically it is not valid)

Example 1

To make this a little more clearer, taking the test string quoted above the desired output should be as follows:

enter image description here

From the image above, light blue colour illustrates Float numbers, pale red illustrates single Integers (but note also how Floats joined together are split into seperate Floats).

  • 45.826 (Float)
  • 53.91 (Float)
  • 7 (Integer)
  • 5 (Integer)
  • 66 . (Float)
  • 4 (Integer)
  • 5.40 (Float)
  • 3 . (Float)

Note there are deliberate spaces between 66 . and 3 . above due to the way the numbers were formatted.

Example 2:

Anoth3r Te5.t string .4 abc 8.1Q 123.45.67.8.9

enter image description here

  • 4 (Integer)
  • 8.1 (Float)
  • 123.45 (Float)
  • 67.8 (Float)
  • 9 (Integer)

To give a better idea, I created a new project whilst testing which looks like this:

enter image description here


Now onto the actual task. I thought maybe I could read each character from the string and identify what are valid numbers as per the rules above, and then pull them into a list.

To my ability, this was the best I could manage:

enter image description here

The code is as follows:

unit Unit1;

{$mode objfpc}{$H+}

interface

uses
  Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls;

type
  TForm1 = class(TForm)
    btnParseString: TButton;
    edtTestString: TEdit;
    Label1: TLabel;
    Label2: TLabel;
    Label3: TLabel;
    lstDesiredOutput: TListBox;
    lstActualOutput: TListBox;
    procedure btnParseStringClick(Sender: TObject);
  private
    FDone: Boolean;
    FIdx: Integer;
    procedure ParseString(const Str: string; var OutValue, OutKind: string);
  public
    { public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.lfm}

{ TForm1 }

procedure TForm1.ParseString(const Str: string; var OutValue, OutKind: string);
var
  CH1, CH2: Char;
begin
  Inc(FIdx);
  CH1 := Str[FIdx];

  case CH1 of
    '0'..'9': // Found a number
    begin
      CH2 := Str[FIdx - 1];
      if not (CH2 in ['A'..'Z']) then
      begin
        OutKind := 'Integer';

        // Try to determine float...

        //while (CH1 in ['0'..'9', '.']) do
        //begin
        //  case Str[FIdx] of
        //    '.':
        //    begin
        //      CH2 := Str[FIdx + 1];
        //      if not (CH2 in ['0'..'9']) then
        //      begin
        //        OutKind := 'Float';
        //        //Inc(FIdx);
        //      end;
        //    end;
        //  end;
        //end;
      end;
      OutValue := Str[FIdx];
    end;
  end;

  FDone := FIdx = Length(Str);
end;

procedure TForm1.btnParseStringClick(Sender: TObject);
var
  S, SKind: string;
begin
  lstActualOutput.Items.Clear;
  FDone := False;
  FIdx := 0;

  repeat
    ParseString(edtTestString.Text, S, SKind);
    if (S <> '') and (SKind <> '') then
    begin
      lstActualOutput.Items.Add(S + ' (' + SKind + ')');
    end;
  until
    FDone = True;
end;

end.

It clearly doesn't give the desired output (failed code has been commented out) and my approach is likely wrong but I feel I only need to make a few changes here and there for a working solution.

At this point I have found myself rather confused and quite lost despite thinking the answer is quite close, the task is becoming increasingly infuriating and I would really appreciate some help.


EDIT 1

Here I got a little closer as there is no longer duplicate numbers but the result is still clearly wrong.

enter image description here

unit Unit1;

{$mode objfpc}{$H+}

interface

uses
  Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls;

type
  TForm1 = class(TForm)
    btnParseString: TButton;
    edtTestString: TEdit;
    Label1: TLabel;
    Label2: TLabel;
    Label3: TLabel;
    lstDesiredOutput: TListBox;
    lstActualOutput: TListBox;
    procedure btnParseStringClick(Sender: TObject);
  private
    FDone: Boolean;
    FIdx: Integer;
    procedure ParseString(const Str: string; var OutValue, OutKind: string);
  public
    { public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.lfm}

{ TForm1 }

// Prepare to pull hair out!
procedure TForm1.ParseString(const Str: string; var OutValue, OutKind: string);
var
  CH1, CH2: Char;
begin
  Inc(FIdx);
  CH1 := Str[FIdx];

  case CH1 of
    '0'..'9': // Found the start of a new number
    begin
      CH1 := Str[FIdx];

      // make sure previous character is not a letter
      CH2 := Str[FIdx - 1];
      if not (CH2 in ['A'..'Z']) then
      begin
        OutKind := 'Integer';

        // Try to determine float...
        //while (CH1 in ['0'..'9', '.']) do
        //begin
        //  OutKind := 'Float';
        //  case Str[FIdx] of
        //    '.':
        //    begin
        //      CH2 := Str[FIdx + 1];
        //      if not (CH2 in ['0'..'9']) then
        //      begin
        //        OutKind := 'Float';
        //        Break;
        //      end;
        //    end;
        //  end;
        //  Inc(FIdx);
        //  CH1 := Str[FIdx];
        //end;
      end;
      OutValue := Str[FIdx];
    end;
  end;

  OutValue := Str[FIdx];
  FDone := Str[FIdx] = #0;
end;

procedure TForm1.btnParseStringClick(Sender: TObject);
var
  S, SKind: string;
begin
  lstActualOutput.Items.Clear;
  FDone := False;
  FIdx := 0;

  repeat
    ParseString(edtTestString.Text, S, SKind);
    if (S <> '') and (SKind <> '') then
    begin
      lstActualOutput.Items.Add(S + ' (' + SKind + ')');
    end;
  until
    FDone = True;
end;

end.

My question is how can I extract numbers from a string, add them to a list and determine if the number is integer or float?

The left pale green listbox (desired output) shows what the results should be, the right pale blue listbox (actual output) shows what we actually got.

Please advise Thanks.

Note I re-added the Delphi tag as I do use XE7 so please don't remove it, although this particular problem is in Lazarus my eventual solution should work for both XE7 and Lazarus.


Solution

  • Your rules are rather complex, so you can try to build finite state machine (FSM, DFA -Deterministic finite automaton).

    Every char causes transition between states.

    For example, when you are in state "integer started" and meet space char, you yield integer value and FSM goes into state " anything wanted".

    If you are in state "integer started" and meet '.', FSM goes into state "float or integer list started" and so on.