My goal is to take a file of sentences, apply some basic filtering, and output the remaining sentences to a file and the terminal. I'm using the Hunspell library.
Here's how I get sentences from the file:
public static string[] sentencesFromFile_old(string path)
{
string s = "";
using (StreamReader rdr = File.OpenText(path))
{
s = rdr.ReadToEnd();
}
s = s.Replace(Environment.NewLine, " ");
s = Regex.Replace(s, @"\s+", " ");
s = Regex.Replace(s, @"\s*?(?:\(.*?\)|\[.*?\]|\{.*?\})", String.Empty);
string[] sentences = Regex.Split(s, @"(?<=\. |[!?]+ )");
return sentences;
}
Here's the code that writes to file:
List<string> sentences = new List<string>(Checker.sentencesFromFile_old(path));
StreamWriter w = new StreamWriter(outFile);
foreach(string x in xs)
if(Checker.check(x, speller))
{
w.WriteLine("[{0}]", x);
Console.WriteLine("[{0}]", x);
}
Here's the checker:
public static bool check(string s, NHunspell.Hunspell speller)
{
char[] punctuation = {',', ':', ';', ' ', '.'};
bool upper = false;
// Check the string length.
if(s.Length <= 50 || s.Length > 250)
return false;
// Check if the string contains only allowed punctuation and letters.
// Also disallow words with multiple consecutive caps.
for(int i = 0; i < s.Length; ++i)
{
if(punctuation.Contains(s[i]))
continue;
if(Char.IsUpper(s[i]))
{
if(upper)
return false;
upper = true;
}
else if(Char.IsLower(s[i]))
{
upper = false;
}
else return false;
}
// Spellcheck each word.
string[] words = s.Split(' ');
foreach(string word in words)
if(!speller.Spell(word))
return false;
return true;
}
The sentences are printed on the terminal just fine, but the text file cuts off mid-sentence at 2015 characters. What's up with that?
EDIT: When I remove some parts of the check
method, the file is cut off at various lengths somewhere around either 2000 or 4000. Removing the spellcheck eliminates the cutoff entirely.
You need to flush the stream before closing it.
w.Flush();
w.Close();
The using
statement (which you should also use) will Close the stream automatically, but it will not flush it.
using( var w = new StreamWriter(...) )
{
// Do stuff
w.Flush();
}