I have several txt files that each file contains more than 3 million lines. Each line contains customer's connections and there are Customer ID, IP address....
I need to find specific IP address and get Customer ID related to it.
I read the file and Split it in an array and search in each line by foreach, but because there are many lines, below error occur.
Exception of type 'System.OutOfMemoryException' was thrown.
I should decompress txt files, because they are compressed. I use below code:
string decompressTxt = decompressTxt = this.Decompress(new FileInfo(filePath));
char[] delRow = { '\n' };
string[] rows = decompressTxt.Split(delRow);
for (int i = 0; i < rows.Length; i++){
if(rows[i].Contains(ip)){
}
}
string Decompress(FileInfo fileToDecompress)
{
string newFileName = "";
string newFIleText = "";
using (FileStream originalFileStream =fileToDecompress.OpenRead())
{
string currentFileName = fileToDecompress.FullName;
newFileName = currentFileName.Remove(currentFileName.Length - fileToDecompress.Extension.Length);
using (FileStream decompressedFileStream = File.Create(newFileName))
{
using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
{
decompressionStream.CopyTo(decompressedFileStream);
}
}
newFIleText = File.ReadAllText(newFileName);
File.Delete(newFileName);
}
return newFIleText;
}
Okay, so there's a lot of things you're doing that aren't necessary, even before we get to how you're running out of memory.
First off, you don't need an intermediate file for decompression, just read off GZipStream
directly. But wait, did you think that you had to use File.ReadAllText
to read text, and thus that's why you uncompress the file first?
That's unecessary. When you want to read text from a stream, you can just use a StreamReader
to do it (this is what File.ReadAllText
uses underneath).
The reader can also be used to read line by line without having to fit the entire file in memory, just each individual line, one at a time. Just call ReadLine()
until it returns null
.
Putting it all together, here's code that decompresses the data and reads it one line at a time, without having to split anything. Not only does it scale with very large files, it's also much faster.
using var stream = new GZipStream(fileToDecompress.OpenRead(), CompressionMode.Decompress);
using var reader = new StreamReader(stream);
string? line;
while ((line = reader.ReadLine()) != null)
{
if (line.Contains(ip))
{
// etc.
}
}