Im trying to use Dictionary of for mapping some words (the int doesnt really so relevant). after inserting the word to the dic (I checked it) i try to go over the whole doc and look for a specific word.
when i do that, even if the word exist in dic, it return false.
what can be the problem and how can i fix it?
public string RemoveStopWords(string originalDoc){
string updatedDoc = "";
string[] originalDocSeperated = originalDoc.Split(' ');
foreach (string word in originalDocSeperated)
{
if (!stopWordsDic.ContainsKey(word))
{
updatedDoc += word;
updatedDoc += " ";
}
}
return updatedDoc.Substring(0, updatedDoc.Length - 1); //Remove Last Space
}
for examle: the dic contains stop words as the word "the". when i get a word "the" from the originalDoc and then wanna check if it is not exist, it still enter the IF statement And both of them write the same! no case sensitivity
Dictionary<string, int> stopWordsDic = new Dictionary<string, int>();
string stopWordsContent = System.IO.File.ReadAllText(stopWordsPath);
string[] stopWordsSeperated = stopWordsContent.Split('\n');
foreach (string stopWord in stopWordsSeperated)
{
stopWordsDic.Add(stopWord, 1);
}
The stopWords file is a file which in each line there is a word
thank you
This is just a guess (just too long for a comment), but when you are inserting on your Dictionary
, you are splitting by \n
.
So if the actual splitter in the text file you are using is \r\n
, you'd be left with \r
's on your inserted keys, thus not finding them on ContainsKey
.
So I'd start with a string[] stopWordsSeperated = stopWordsContent.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
then trim
As a side note, if you are not using the dictionary int values for anything, you'd be better of using a HashSet<string>
and Contains
instead of ContainsKey