I have a list of files on the disk which I need to get and load it in memory. I created a FileConfig
class as shown below which has all the metadata for each file.
public class FileConfig
{
public string FileName { get; set; }
public DateTime Date { get; set; }
public string FileContent { get; set; }
public string MD5Hash { get; set; }
}
I have a MD5Hash
string for each file so that I can compare it later on with some other files to figure out whether particular file is different or not.
Below is my code where I get list of all the files from the disk and then I make a list of FileConfig
object from it.
private IEnumerable<FileConfig> LoadFiles(string path)
{
IList<string> files = procUtility.GetListOfFiles(path);
if (files == null || files.Count == 0) { yield return default; }
for (int i = 0; i < files.Count; i++)
{
var cfgPath = files[i];
if (!File.Exists(cfgPath)) { continue; }
var date = File.GetLastWriteTimeUtc(cfgPath);
var content = File.ReadAllText(cfgPath);
var pathPieces = cfgPath.Split(System.IO.Path.DirectorySeparatorChar, StringSplitOptions.RemoveEmptyEntries);
var fileName = pathPieces[pathPieces.Length - 1];
var md5Hash = procUtility.GetMD5Hash(cfgPath);
yield return new FileConfig
{
FileName = fileName,
Date = date,
FileContent = content,
MD5Hash = md5Hash
};
}
}
My goal at the end is to compare files (and also use file content for some other purpose) so I was using MD5Hash
string of each file in FileConfig
class and figuring it out whether they are different or not like below:
!newFile.MD5Hash.Equals(oldFile.First().MD5Hash)
Is there any better way by which I can inherit FileInfo
class in my FileConfig
class and then use length
method of each file to do the comparison? or what I have is fine here?
What you have is fine. md5sum is designed to generate a hash based on the file contents; even a slight byte difference would generate a different hash. The chance you would generate false positives from comparing md5sums is in the millions and the contents of the files would have to be drastically different to have that chance.
However, a byte by byte comparison may be faster in your case as generating check sums load each byte of the file and then process them. If you do require a byte by byte comparison then use System.IO.FileInfo
and File.ReadAllBytes(FileInfo fileName).SequenceEqual(File.ReadAllBytes(FileInfo fileName)
The byte comparison ends at the first difference, which is where we assume it is faster than comparing md5sums as the md5 hash generator will not end at a difference.
You can also use the following on non-binary files:
File.ReadLines(file).SequenceEqual(File.ReadLines(file))
As for comparing by length, you should never want to rely on that. There's little to no benefit for adding the check to it and the false positives generated from relying solely on the check are numerable.