I read the file line by line and insert this data to DB using entity framework. The reading is very slow. The file is almost 6 millions of rows and I need encrease the perfomance of reading of the file. It is a dictionary of words in this file and I need to insert these words in database table. Below are several lines of that file.
390201
ТАТАМИ NOUN,inan,neut,Fixd sing,nomn
ТАТАМИ NOUN,inan,neut,Fixd sing,gent
ТАТАМИ NOUN,inan,neut,Fixd sing,datv
ТАТАМИ NOUN,inan,neut,Fixd sing,accs
ТАТАМИ NOUN,inan,neut,Fixd sing,ablt
ТАsing,gent
ОРИГАМИ NOUN,inan,neut,Fixd ТАМИ NOUN,inan,neut,Fixd sing,loct
ТАТАМИ NOUN,inan,neut,Fixd plur,nomn
ТАТАМИ NOUN,inan,neut,Fixd plur,gent
ТАТАМИ NOUN,inan,neut,Fixd plur,datv
ТАТАМИ NOUN,inan,neut,Fixd plur,accs
ТАТАМИ NOUN,inan,neut,Fixd plur,ablt
ТАТАМИ NOUN,inan,neut,Fixd plur,loct
390202
ОРИГАМИ NOUN,inan,neut,Fixd sing,nomn
ОРИГАМИ NOUN,inan,neut,Fixd sing,datv
ОРИГАМИ NOUN,inan,neut,Fixd sing,accs
ОРИГАМИ NOUN,inan,neut,Fixd sing,ablt
ОРИГАМИ NOUN,inan,neut,Fixd sing,loct
ОРИГАМИ NOUN,inan,neut,Fixd plur,nomn
ОРИГАМИ NOUN,inan,neut,Fixd plur,gent
ОРИГАМИ NOUN,inan,neut,Fixd plur,datv
ОРИГАМИ NOUN,inan,neut,Fixd plur,accs
My code for parsing of that file is below:
public static void parseFileFromToSegment(int beginId, int endId)
{
using (var db = new Context())
{
string theWordFromFile;
string wordData;
int wordIdFromFile = 1;
int tempWordId;
IEnumerable<string> allFileLines = File.ReadLines(fileName);
allFileLines = allFileLines.SkipWhile(n => n != beginId.ToString());
foreach (string line in allFileLines)
{
if (string.IsNullOrEmpty(line))
continue;
if (!string.IsNullOrEmpty(line) && Int32.TryParse(line, out tempWordId))
{
if (tempWordId < beginId)
{
continue;
}
if (tempWordId > endId)
break;
wordIdFromFile = tempWordId;
if (wordIdFromFile % 100 == 0)
Console.WriteLine("Current id - " + wordIdFromFile);
continue;
}
theWordFromFile = line.Substring(0, line.IndexOf('\t'));
wordData = line.Substring(line.IndexOf('\t')).Trim();
TheWord theWord = new TheWord { WordFormId = wordIdFromFile, word = theWordFromFile, word_form_data = wordData };
db.TheWords.Add(theWord);
}
db.SaveChanges();
Console.WriteLine("saved");
}
}
So the speed of reading is very slow. What can I do to improve performance? Thank you
It's not the file reads that are slow. It's the DB inserts.
You could use pure ADO.NET with a DataAdapter
to insert the rows (using batching) or the SQLBulkCopy
class (example).