I'm trying to read through a huge text file, approx 10 gigabytes. I want to find the last occurrence of a string.
e.g below is a sample of 5 lines the 2nd and 5th are the same string.
I want to take the last one as it is the latest and output that to a text file using streamreader.
Am I better off using Regex or am I better off using a lastindexof
to determine if it is the last string?
I have a lot of these searches to do so I would create some kind of array and have it search from bottom up to improve performance.
Can someone point me in the right direction?
GET/a/users/115656WindowsNT6.1;Trident
GET/a/users/126692MSIE7.0;WindowsNT6.1
GET/a/users/77562WindowsNT6.1;WOW64;Tr
GET/a/users/35650WindowsNT6.1;WOW64;Tr
GET/a/users/126692MSIE7.0;WindowsNT6.2
I believe that File.ReadLines
is one of the best ways to read the large files according to msdn :
The ReadLines and ReadAllLines methods differ as follows: When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned; when you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array. Therefore, when you are working with very large files, ReadLines can be more efficient.
So depending on this I wrote the following code and I hope it help :
Dim myList As List(Of String) = IO.File.ReadLines("MyLargFile.txt").OfType(Of String)().Where(Function(s) s.Contains("126692MSIE7")).ToList
This piece of code will return you a list of match string lines.
Output :
myList(0) = "GET/a/users/126692MSIE7.0;WindowsNT6.1"
myList(1) = "GET/a/users/126692MSIE7.0;WindowsNT6.2"
And of course if need the last line you may use the Last
method:
Dim last As String = myList.Last