Hi I am using c# language in my project and I am trying to get output something like below.
string str1 = "Cat meet's a dog has";
string str2 = "Cat meet's a dog and a bird";
string[] str1Words = str1.ToLower().Split(' ');
string[] str2Words = str2.ToLower().Split(' ');
var uniqueWords = str2Words
.Except(str1Words)
.Concat(str1Words.Except(str2Words))
.ToList();
This gives me out put has,and ,a, bird which is correct but what i would like is something like below
has - present in first string not present in second string
and a bird - not present in first string but present in second string
For example, second user case
String S1 = "Added"
String S2 = "Edited"
here out put should be
Added - present in first string not present in second string
Edited - not present in first string but present in second string
I would like to have some indication which is present in first and not in second, present in second and not in first and comparison should be word by word rather than character by character. Can someone please help me with this. Any help would be appreciated. Thanks
I suggest matching words
Let word be a sequence of letters and apostrophes
with a help of regular expression (please, note that splitting doesn't take punctuation into account and thus cat
cat,
and cat!
will be considered three different words) and then query matches for two given strings:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
...
private static readonly Regex WordsRegex = new Regex(@"[\p{L}']+");
// 1 - in text1, 2 - in text2, 3 - in both text1 and text2
private static List<(string word, int presentAt)> MyWords(string text1, string text2) {
HashSet<string> words1 = WordsRegex
.Matches(text1)
.Cast<Match>()
.Select(match => match.Value)
.ToHashSet(StringComparer.OrdinalIgnoreCase);
HashSet<string> words2 = WordsRegex
.Matches(text2)
.Cast<Match>()
.Select(match => match.Value)
.ToHashSet(StringComparer.OrdinalIgnoreCase);
return words1
.Union(words2)
.Select(word => (word, presentAt: (words1.Contains(word) ? 1 : 0) |
(words2.Contains(word) ? 2 : 0)))
.ToList();
}
Demo:
string str1 = "Cat meet's a dog has";
string str2 = "Cat meet's a dog and a bird";
var result = MyWords(str1, str2);
var report = string.Join(Environment.NewLine, result);
Console.Write(report);
Output:
(Cat, 3) # 3: in both str1 and str2
(meet's, 3) # 3: in both str1 and str2
(a, 3) # 3: in both str1 and str2
(dog, 3) # 3: in both str1 and str2
(has, 1) # 1: in str1 only
(and, 2) # 2: in str2 only
(bird, 2) # 2: in str2 only
If you want a wordy output:
string str1 = "Cat meet's a dog has";
string str2 = "Cat meet's a dog and a bird";
string[] options = new string[] {
"not present",
"present in first string not present in second string",
"not present in first string but present in second string",
"present in first string and present in second string"
};
var report = string.Join(Environment.NewLine, result
.Select(pair => $"{pair.word} - {options[pair.presentAt]}"));
Console.Write(report);
Output:
Cat - present in first string and present in second string
meet's - present in first string and present in second string
a - present in first string and present in second string
dog - present in first string and present in second string
has - present in first string not present in second string
and - not present in first string but present in second string
bird - not present in first string but present in second string