Search code examples
lazarusfreepascal

Find & Extract hashtags in text


I'm looking for an easy/quick way to identify and extract hashtags from a string, and temporarily store them separately - e.g.:

If I have the following string:

2017-08-31 This is a useless sentence being used as an example. #Example #Date:2017-09-01 #NothingWow (and then some more text for good measure).

Then I want to be able to get this:

#Example
#Date:2017-09-01
#NothingWow

I figured storing it in a TStringList should be sufficient until I'm done. I just need to store them outside of the original string for easier cross referencing, then if the original string changes, add them back at the end. (but that's easy - its the extracting part I'm having trouble with)

It should start at the # and end/break when it encounters a [space].

The way I initially planned it was to use Boolean flags (defaulted to False), then check for the different hashtags, set them to true if found, and extract anything after a [:] separately. (but I'm sure there is a better way of doing it)

Any advice will be greatly appreciated.


Solution

  • The following shows a simple console application which you could use as the basis for a solution. It works because assigning your input string to the DelimitedText property of a StringList causes the StringList to parse the input into a series of space-limited lines. It is then a simple matter to look for the ones which start with a #.

    The code is written as a Delphi console application but should be trivial to convert to Lazarus/FPC.

    Code:

    program HashTags;
    
    {$APPTYPE CONSOLE}
    
    uses
      Classes, SysUtils;
    
    procedure TestHashTags;
    var
      TL : TStringList;
      S : String;
      i : Integer;
    begin
      TL := TStringList.Create;
      try
        S := '2017-08-31 This is a useless sentence being used as an example. #Example #Date:2017-09-01 #NothingWow (and then some more text for good measure)';
        TL.DelimitedText := S;
        for i := 0 to TL.Count - 1 do begin
        if Pos('#', TL[i]) = 1 then
          writeln(i, ' ', TL[i]);
        end;
      finally
        TL.Free;
      end;
      readln;
    end;
    
    begin
      TestHashTags;
    end.