Which file to be used for eval step in TEXTSUM?

Am working on the texsum model of tensorflow which is text summarization. I was following commands specified in readme at github/textsum. It said that file named validation, present in data folder, is to be used in eval step, but there was no validation file in data folder.

I thought to make one myself and later realized that it should be a binary file. So I needed to prepare a text file which will be converted to binary. But that text file has to have a specific format. Will it be same as that of the file used in train step? Can i use the same file for train step and eval step? The sequence of steps i followed are:

Step 1: Train the model by using the vocab file which was mentioned as "updated" for toy dataset

Step 2: Training continued for a while and it got "Killed" at running_avg_loss: 3.590769

Step 3: Using the same data and vocab files for eval step, as had been used for training, I ran eval. It keeps on running with running_avg_loss between 6 to 7

I am doubtful of step 3, if same files are to be used or not.

Solution

So you don't have to run eval unless you are in fact testing your model after you have trained to determine how the training does against another set of data it has never seen before. I have also been sing it to determine if I am starting to overfit the data.

So you will usually take 20-30% of your overall dataset and use it for the eval process. You then go about training against your training data. Once complete, you can just run decode right away should you desire or you can run eval against the 20% - 30% dataset you set aside form the start. Once you feel comfortable with the results you can then run your decode to get the results.

Your binary format should be the same as your training data.