Description of the current situation:
I have an excel file of approximately 315 columns and 4000 rows. The file contains the answers to a 300-question questionnaire. The data format is as follows:
(Headers) A | B | C | D | E | F | Q.1 | Q.2 | ... | Q.300 |
(FirstRow) Info of first participant | AnswerCode for every Q |
The columns A to F contain contain info on every participant, while the columns Q.1 to Q.300 contain the respective answer code to each question. After storing the file as a large DataTable:
I need to load all 4000 rows on an existing database table, but before I do that I must edit the data format. The end result must become:
ParticipantCode | QuestionCode | AnswerCode | DateOfRegistration
00001 | 0001 | 1234567 | yyyy-MM-dd HH:mm:ss
... | ... | ... | ...
00001 | 0300 | 1234567 | yyyy-MM-dd HH:mm:ss
00002 | 0001 | 1234567 | yyyy-MM-dd HH:mm:ss
... | ... | ... | ...
04000 | 0300 | 1234567 | yyyy-MM-dd HH:mm:ss
So every row of the original ExcelDataTable is transformed into 300 rows in the FinalDataTable. In this way, the FinalDataTable will have about 1.2 million rows.
What Have I implemented so far:
Private Function MyFunction()
For Each ExcelRow As DataRow In ExcelDataTable.Rows
For Each ExcelColumn As DataColumn In ExcelDataTable.Columns
QuestionCodeFound = False
ExcelColumnNameRaw = ExcelColumn.ColumnName.ToString.Trim
If ExcelColumnNameRaw.StartsWith("Q") Then
' Correct the headers
ExcelColumnSplit = ExcelColumnNameRaw.Split("#")
ExcelColumnName = String.Concat(ExcelColumnSplit(0), ExcelColumnSplit(1))
SelectedRowFromDT = QuestionCodeAndQuestionIDDataTable.Select("QuestionID = '" + ExcelColumnName + "'")
' Search for "_", because some questions are different
If SelectedRowFromDT.Length > 0 Then
QuestionCodeFound = True
Else
Dim ExcelColumnSplitForMult As String()
ExcelColumnSplitForMult = ExcelColumnName.Split("_")
SelectedRowFromDT = QuestionCodeAndQuestionIDDataTable.Select("QuestionID = '" + ExcelColumnSplitForMult(0).ToString + "'")
If SelectedRowFromDT.Length > 0 Then
QuestionCodeFound = True
End If
End If
If QuestionCodeFound Then
Dim QuestionCode As String
Dim QuestionTypeDataTable As DataTable
Dim QuestionType As String
' Get the Question Type from the respective table
QuestionType = String.Empty
QuestionCode = SelectedRowFromDT(0).Item("QuestionCode").ToString
QuestionTypeDataTable = SearchInSql(My.Settings.ConnectionString, SQLString)
If QuestionTypeDataTable.Rows.Count > 0 Then
QuestionType = QuestionTypeDataTable.Rows(0).Item(0).ToString.Trim
End If
' Fix the Date Format
DateRaw = ExcelRow.Item(1).ToString
DateSplit = DateRaw.Split("/")
If DateSplit(0).Length = 1 Then
DateSplit(0) = String.Concat("0", DateSplit(0))
End If
If DateSplit(1).Length = 1 Then
DateSplit(1) = String.Concat("0", DateSplit(1))
End If
DateText = String.Concat(DateSplit(0), "/", DateSplit(1), "/", DateSplit(2))
DateRegistration = DateTime.ParseExact(DateText, "MM/dd/yyyy", CultureInfo.InvariantCulture)
DateRegistrationReformed = DateRegistration.ToString("yyyy-MM-dd", CultureInfo.InvariantCulture)
DateRegFinal = DateTime.ParseExact((DateRegistrationReformed + " " + "10:00:00").ToString, "yyyy-MM-dd HH:mm:ss", CultureInfo.InvariantCulture)
Dim AnswerValue As String
Dim AnswerCode As String
Dim AnswerCodeDataTable As DataTable
Dim QuestionWasAnswer As String
Dim AnswerValueRow() As DataRow = ExcelDataTable.Select("ParticipantCode = '" + ExcelRow.Item(2).ToString + "'")
AnswerCodeDataTable = New DataTable
AnswerValue = ""
QuestionWasAnswer = "0"
' Complete "QuestionWasAnswer" field for all questions and retrieve the AnswerCode for the answer given by each participant
If AnswerValueRow.Length > 0 And AnswerValueRow(0).Item(ExcelColumnNameRaw).GetType IsNot GetType(DBNull) Then
If Not (QuestionType.Equals("02") Or QuestionType.Equals("03")) Then
AnswerValue = AnswerValueRow(0).Item(ExcelColumnNameRaw)
QuestionWasAnswer = "1"
ElseIf QuestionType.Equals("02") Or QuestionType.Equals("03") Then
Dim ExcelColumnSplitForMultSecond As String()
Dim MultAnswerValue As String
ExcelColumnSplitForMultSecond = ExcelColumnName.Split("_")
MultAnswerValue = AnswerValueRow(0).Item(ExcelColumnNameRaw).ToString.Trim
AnswerValue = ExcelColumnSplitForMultSecond(1).ToString
If MultAnswerValue.Equals("1") Then
QuestionWasAnswer = "1"
ElseIf MultAnswerValue.Equals("2") Then
QuestionWasAnswer = "2"
End If
End If
' Search in the Answers table for the existing AnswerCode
SQLString = String.Format("SELECT Answers.AnswerCode
FROM Answers
WHERE Answers.QuestionCode = '{0}'
AND (Answers.AnswerNumber = '{1}' OR Answers.Answer = '{1}')", QuestionCode, AnswerValue)
AnswerCodeDataTable = SearchInSql(My.Settings.ConnectionString, SQLString)
If AnswerCodeDataTable.Rows.Count > 0 Then
AnswerCode = AnswerCodeDataTable.Rows(0).Item(0).ToString
FormattedDataTable.Rows.Add(ParticipantAnswerCode, ExcelRow.Item(2), QuestionCode, AnswerCode, QuestionWasAnswer, DateRegFinal)
ParticipantAnswerCode = Convert.ToInt32(ParticipantAnswerCode + 1).ToString.PadLeft(ParticipantAnswerCodeFieldLength, "0")
Else
' If a given answer does not exist, save it in the respective table and then try again
Dim AnswerCodeLength = GetLengthFromSqlDataBase(My.Settings.ConnectionString, "Answers", "AnswerCode")
Dim NextAnswerCode = CalculateNextAnswerCode(AnswerCodeLength)
Dim NestAnswerNumber = CalculateNextAnswerNumber(QuestionCode)
SaveNewAnswer(NextAnswerCode, QuestionCode, NestAnswerNumber, AnswerValue)
SQLString = String.Format("SELECT Answers.AnswerCode
FROM Answers
WHERE Answers.QuestionCode = '{0}'
AND Answers.Answer = '{1}'", QuestionCode, AnswerValue)
AnswerCodeDataTable = SearchInSql(My.Settings.ConnectionString, SQLString)
If AnswerCodeDataTable.Rows.Count > 0 Then
AnswerCode = AnswerCodeDataTable.Rows(0).Item(0).ToString
FormattedDataTable.Rows.Add(ParticipantAnswerCode, ExcelRow.Item(2), QuestionCode, AnswerCode, QuestionWasAnswer, DateRegFinal)
ParticipantAnswerCode = Convert.ToInt32(ParticipantAnswerCode + 1).ToString.PadLeft(ParticipantAnswerCodeFieldLength, "0")
End If
End If
End If
End If
End If
Next
Next
Return FormattedDataTable
End Function
After that, I bulk insert the FinalDataTable on the DB.
The problem I am facing:
Using the current program I built, every row in the ExcelDataTable takes about 40 seconds to transform into 300 rows in the FinalDataTable. If I try to load all 4000 rows, it will take more than 40 hours to transform the entire datatable. I need to find a faster way to do this.
As mentioned, there isn't much to go off of on this with what has been provided.
I'm sure there are more helpful fixes to consider but I wanted to put my two cents in about the For Loops.
I recommend switching the
For Each
statements with
For i as integer = 0 to ExcelDataTable.Rows.Count - 1
I've read that For Each is not as performance-friendly as it gathers each "row" as a collection, therefore increasing the overhead per loop.
Here is a SO post about this subject:
Major difference between 'for each' and 'for' loop in .NET
Not sure if that will make a difference for you but thought I would recommend it anyway.