I have below HTML
file which contains content like below:
<HTML>
<BODY>
...
........ company's Chief Financial Officer. Now the.......
...
</BODY>
</HTML>
I am reading the content of this file using:
StringBuilder stringBuilder = new StringBuilder();
using (StreamReader sr = new StreamReader(filePath))
{
String line = sr.ReadToEnd();
stringBuilder.Append(line);
}
strFileContent = stringBuilder.ToString();
However it is returning string as:
........ company�s Chief Financial Officer.���Now the.......
HTML
files are in my local system.
You need to use the same encoding which was used to create the file. StreamReader
assumes your encoding is UTF8
by default and tries to decode the file using that, but your original encoding is windows-1252
(as you said in comments). Trying to read with wrong encoding produces junk data for obvious reasons.
You should explicitly say what encoding the file is in. Here's how you do it.
var encoding = Encoding.GetEncoding(1252);//windows-1252
using (StreamReader sr = new StreamReader(filePath, encoding))
...