Search code examples
c#stringtesseractcontains

Compare two string values, one of them being a tesseract output, the other a .txt file


I have a program that uses tesseract to analyze an image taken as a screenshot from the computer. I also have a text file containing "F1 car Bahrain".

try
{
    var path = @"C:\source\repos\TEst1\packages\Tesseract.4.1.1";
    string LOG_PATH = "C:\\Desktop\\start.txt";

    var sourceFilePath = @"C:\\source\\repos\\Scrren\\Scrren\\bin\Debug\\TestImage.png";
    using (var engine = new TesseractEngine(path, "eng"))
    {
        using (var img = Pix.LoadFromFile(sourceFilePath))
        {
            using (var page = engine.Process(img))
            {
                var results = page.GetText();

                string WordsFrom = File.ReadAllText(LOG_PATH);
                string WordsFromList = WordsFrom.ToLower();

                string ScreenResult = results.ToLower().ToString();
                string Match = ScreenResult;    

                bool C = Match.Contains(WordsFromList);
                if (C)
                {
                    Console.WriteLine("Match");
                }
                else
                {
                    Console.WriteLine("No Match");
                }    
            }
        }
    }
}
catch (Exception e)
{
    Thread.Sleep(1500);
}

This code will give me an output of

"1 day ago cce sc ume f1 bahrain grand prix ~ start time, how ake nos video nea cea a] 8 reasons 2021 will go down in f1"

Obviously tesseract isn't perfect so some of it is jiberish, but the words f1 AND bahrain are in there, so I don't understand why bool C doesn't turn true. I am completely stumped and would appreciate the help greatly.

Printing the string "WordsFromList" to the console will show that it is correctly adding in both f1 and bahrain as well.


Solution

  • See the comments in the code below:

    using System.Text;
    
    string searchFor = "F1 car Bahrain";
    string searchIn = "1 day ago cce sc ume f1 bahrain grand prix ~ start time, how ake nos video nea cea a] 8 reasons 2021 will go down in f1";
    
    // Returns false because there is no exact match for string "F1 car Bahrain" in the searchIn string.
    Console.WriteLine($"Does {searchIn} contain {searchFor} => {searchIn.Contains(searchFor)}");
    
    var words = searchFor.Split(' '); // Result is a string[] with 3 words ("F1", "car", "Bahrain").
    
    // Returns false because 'car' is not in the input string. The 'All()' extension method only returns true if all words are matched.
    Console.WriteLine($"Does {searchIn} contain {searchFor} => {words.All(word => searchIn.Contains(word, StringComparison.InvariantCultureIgnoreCase))}");
    // Returns true because 'F1' or 'bahrain' are found in the input string. The 'Any()' extension method retuns true if any word matches.
    Console.WriteLine($"Does {searchIn} contain {searchFor} => {words.Any(word => searchIn.Contains(word, StringComparison.InvariantCultureIgnoreCase))}");
    
    Console.ReadKey();