Search code examples
c#filereadlines

very slow reading of txt file using File.ReadLines in c#


I read the file line by line and insert this data to DB using entity framework. The reading is very slow. The file is almost 6 millions of rows and I need encrease the perfomance of reading of the file. It is a dictionary of words in this file and I need to insert these words in database table. Below are several lines of that file.

390201
ТАТАМИ  NOUN,inan,neut,Fixd sing,nomn
ТАТАМИ  NOUN,inan,neut,Fixd sing,gent
ТАТАМИ  NOUN,inan,neut,Fixd sing,datv
ТАТАМИ  NOUN,inan,neut,Fixd sing,accs
ТАТАМИ  NOUN,inan,neut,Fixd sing,ablt
ТАsing,gent
ОРИГАМИ NOUN,inan,neut,Fixd ТАМИ    NOUN,inan,neut,Fixd sing,loct
ТАТАМИ  NOUN,inan,neut,Fixd plur,nomn
ТАТАМИ  NOUN,inan,neut,Fixd plur,gent
ТАТАМИ  NOUN,inan,neut,Fixd plur,datv
ТАТАМИ  NOUN,inan,neut,Fixd plur,accs
ТАТАМИ  NOUN,inan,neut,Fixd plur,ablt
ТАТАМИ  NOUN,inan,neut,Fixd plur,loct

390202
ОРИГАМИ NOUN,inan,neut,Fixd sing,nomn
ОРИГАМИ NOUN,inan,neut,Fixd sing,datv
ОРИГАМИ NOUN,inan,neut,Fixd sing,accs
ОРИГАМИ NOUN,inan,neut,Fixd sing,ablt
ОРИГАМИ NOUN,inan,neut,Fixd sing,loct
ОРИГАМИ NOUN,inan,neut,Fixd plur,nomn
ОРИГАМИ NOUN,inan,neut,Fixd plur,gent
ОРИГАМИ NOUN,inan,neut,Fixd plur,datv
ОРИГАМИ NOUN,inan,neut,Fixd plur,accs

My code for parsing of that file is below:

public static void parseFileFromToSegment(int beginId, int endId)
    {
    using (var db = new Context())
    {
        string theWordFromFile;
        string wordData;
        int wordIdFromFile = 1;
        int tempWordId;

        IEnumerable<string> allFileLines = File.ReadLines(fileName);
        allFileLines = allFileLines.SkipWhile(n => n != beginId.ToString());
        foreach (string line in allFileLines)
        {
            if (string.IsNullOrEmpty(line))
                continue;
            if (!string.IsNullOrEmpty(line) && Int32.TryParse(line, out tempWordId))
            {
                if (tempWordId < beginId)
                {
                    continue;
                }
                if (tempWordId > endId) 
                    break;

                wordIdFromFile = tempWordId;
                if (wordIdFromFile % 100 == 0)
                    Console.WriteLine("Current id - " + wordIdFromFile);
                continue;
            }

            theWordFromFile = line.Substring(0, line.IndexOf('\t'));
            wordData = line.Substring(line.IndexOf('\t')).Trim();
            TheWord theWord = new TheWord { WordFormId = wordIdFromFile, word = theWordFromFile, word_form_data = wordData };

            db.TheWords.Add(theWord);
        }
        db.SaveChanges();
        Console.WriteLine("saved");
    }
}

So the speed of reading is very slow. What can I do to improve performance? Thank you


Solution

  • It's not the file reads that are slow. It's the DB inserts.

    You could use pure ADO.NET with a DataAdapter to insert the rows (using batching) or the SQLBulkCopy class (example).