Search code examples
f#phrases

Finding phrases in a string and the frequency of each phrase


I am working on a script using f# that find phrases in a given string or text, and the frequency of each phrase.

the phrase would 2 or more words.

I know how to do it in other languages but I'm interesting in anonymous functions in F Sharp, with which currently I'm learning and finding out.

This is a very complex and useful idea since phrases contain two or more words.

What I have so far:

  let containsPhrase (phrase:string) (text:string) =   

     let rec contains index =
         if index <= text.Length - phrase.Length then compare index
         else false
     and compare index =        
         if String.Compare(text, index, phrase, 0, phrase.Length) <> 0
         then nextWord index
         else true
     and nextWord index =
         let index = text.IndexOf(' ', index)

         if index >= 0 then      
            contains (index+1)
         else 
         false             
     contains 0

 let Phrases = ["Good morning";"Take care";"black Friday"] 

 for phrase in Phrases do
    printfn "[%A] was found %b" phrase (containsPhrase (phrase.ToLower()) text)

I could find a solution, for the first part of the problem, but I feel lost after many tries to count how many each phrase was used in the string.

the code above can check whether or not any given phrases are in the a string.

Could anyone please help me adding a counter for the frequency of each phrase?


Solution

  • Something like this?

    let text = """
    Good morning Take care black Friday
    Good morning Take care black Friday
    Good morning Take care black Friday
    Good morning Take care black Friday
    Good morning Take care black Friday
    """
    
    let phrases = ["Good morning";"Take care";"black Friday"] 
    
    let occurrences (phrase: string) =
      let rec loop (index: int) count =
        match text.IndexOf(phrase, index) with
        | -1 -> count
        | n -> loop (n + phrase.Length) (count + 1)
      loop 0 0
    
    phrases |> List.map (fun s -> s, occurrences s)
    > val it : (string * int) list =
      [("Good morning", 5); ("Take care", 5); ("black Friday", 5)]