Search code examples
excelvbaadodbjetms-jet-ace

How can I prevent Microsoft ACE and JET in VB6 duplicating data from the first row in Excel spreadsheet?


I'm dealing with a legacy application written in VB6 which reads in Excel spreadsheets and inserts them into a database.
It works for the most part, but if the data in the spreadsheet does not start on the first row, the first data row gets duplicated.
Say for example the first 3 rows in the spreadsheet are blank, the top four rows of data look like the following:

_|  A  |  B  |  C  |  D  |  E  |  F  |  G  |
1|     |     |     |     |     |     |     |
2|     |     |     |     |     |     |     |
3|     |     |     |     |     |     |     |
4|  99 |Text1|Text2|Text3|Text4|Text5|  77 |

The application connects to the Excel spreadsheet and reads it in using the following code:

Public Function obtainConnectionExcel(sql_conn, uid) As Variant
Dim cn As Object

Set cn = CreateObject("ADODB.Connection")
On Error Resume Next

cn.Provider = "Microsoft.ACE.OLEDB.12.0"
cn.Properties("Extended Properties").Value = "Excel 12.0;ReadOnly=True;HDR=No;IMEX=1"

If (Err <> 0) Then
    cn.Provider = "Microsoft.Jet.OLEDB.4.0"
    cn.Properties("Extended Properties").Value = "Excel 8.0;ReadOnly=True;HDR=No;IMEX=1"
End If

On Error Resume Next
cn.open getSpreadsheetPath(sql_conn, uid)
Set obtainConnectionExcel = cn
Exit Function
End Function
.....
Public Function extractAllData(parameters) As String

..... 'Variable declarations etc
On Error Resume Next
Set dbo_conn = obtainConnectionExcel(sql_conn, uid)
If Err <> 0 Then
    ....'logs error, goes to error handler
End If
On Error GoTo ErrorHandler

If (dbo_conn.State = 1) Then
    rownumber = 1

    Do While rownumber <= numberOfRowsToGet

        For x = lettercount To lettercount + lettercount_offset

            letter = Chr(x)
            sSql = "SELECT * FROM [" & worksheet & "$" & letter & rownumber & ":" & letter & rownumber & "]"
            On Error Resume Next
            Set rs = dbo_conn.execute(sSql)

            If (Not rs.EOF) Then
                'inserts the data into the db
            End If

        Next x

        rownumber = rownumber + 1
    Loop

    .... 'Post processing

Exit Function

....'Error handlers

End Function

That should be the relevant code. The issue occurs at the lines:

sSql = "SELECT * FROM [" & worksheet & "$" & letter & rownumber & ":" & letter & rownumber & "]"
        On Error Resume Next
        Set rs = dbo_conn.execute(sSql)

When the data is read in, regardless of whether we are using JET or ACE, the data is returned this way:

_|  A  |  B  |  C  |  D  |  E  |  F  |  G  |
1|  99 |     |     |     |     |     |  77 |
2|  99 |     |     |     |     |     |  77 |
3|  99 |Text1|Text2|Text3|Text4|Text5|  77 |
4|  99 |Text1|Text2|Text3|Text4|Text5|  77 |

I have tried connecting to the spreadsheet and getting the data in a multitude of ways, but nothing seems to work - either the connection would fail, or the data would be nothing but nulls.
I have found some work-arounds - for example, if I enter a space character into cell A1, the issue no longer occurs. However, I would like a program-based solution, rather than having to tell users to do extra steps to avoid this.
It only copies the first row of data. If the data in the cell is a number, then it copies the data into every cell above it in that column, if it is text then it only goes up by one level.
An interesting note is that, if I alter the spreadsheet to say that all of the data is text, it then copies every cell as if they are numbers (i.e. into every single cell above, rather than one single row)

All in all this is rather irritating - since I don't have any luck when searching for this issue, I can only conclude that we are doing something wrong, or very few people have bothered with this type of test data.

[edit] After some investigation I have made some headway in solving this - "The provider assumes that your table of data begins with the upper-most, left-most, non-blank cell on the specified worksheet" (http://support.microsoft.com/default.aspx?scid=kb;en-us;257819 ). This is confirmed if I use a statement to select the entire worksheet - it only returns the block of data.
So when I select any cell that is outside of that range, the provider, rather than doing something sensible like returning null, returns the data from the upper-most non-empty cell from that particular column.
I could hypothetically change the system so that it simply grabs all of the data and assumes that the upper-most left-most cell is cell A1, but that would break compatibility with data that is already out there.
What I now need is a way to get the cell-references for the data that is returned, so I can treat it appropriately, or a way to force this to no longer happen.


Solution

  • To answer my own question: It looks like you can't. You can, however, try to code around the issue.

    The JET and ACE providers both take the upper-most left-most non-empty cell as being the start of the dataset (http://support.microsoft.com/default.aspx?scid=kb;en-us;257819 ) So, when you try to get the value from a cell that occurs before the start of that dataset, rather than doing something sensible and returning null, the providers return a guess based upon the top row of data.

    I could find no way to get the cell-references of the returned dataset from the provider - it labelled everything as F1, F2 etc. ("Field 1", "Field 2")

    So there were two remaining solutions:

    1) Grab the entire dataset at once, assume the data starts at A1, and insert it into the DB using that assumption. This unfortunately would result in breaking compatability with pre-existing data.

    2) Programmatically figure out the cell references, and input the data correctly. I did this using the following abbreviated code

    sSql = "SELECT * FROM [" & worksheet & "$]"
    Set rs = dbo_conn.execute(sSql)
    
    rownumber = 1
    If Not rs.EOF Then
        oledata_array = rs.GetRows()
    Else
        ReDim oledata_array(0, 0)
    End If
    
    Do While rownumber <= numberOfRowsToGet
        col_number = 1
        For x = lettercount To lettercount + lettercount_offset
    
            letter = Chr(x)
            sSql = "SELECT * FROM [" & worksheet & "$" & letter & rownumber & ":" & letter & rownumber & "]"
            On Error Resume Next
            Set rs = dbo_conn.execute(sSql)
            If Not rs.EOF Then
                If rs(0) <> "" Then
                    If x < furthest_left Then
                        furthest_left = x
                    End If
                    If x > furthest_right Then
                        furthest_right = x
                    End If
                    If rownumber > bottom_of_set Then
                        bottom_of_set = rownumber
                    End If
                Else
                End If
            End If
    
            col_number = col_number + 1
    
        Next x
    
        rs.MoveNext
        rownumber = rownumber + 1
    Loop
    rs.Close
    
    top_of_set = bottom_of_set - UBound(oledata_array, 2)
    
    If CLng(UBound(oledata_array, 1)) <> CLng(furthest_right - furthest_left) Then
        'log the fact that there is a discrepency, and continue
    End If
    
    'now have the co-ords of the "square" of data as it occurs in the spreadsheet
    
    rownumber = 1
    row_index = 0
    Do While rownumber <= numberOfRowsToGet
        col_number = 1
        For x = lettercount To lettercount + lettercount_offset
    
            letter = Chr(x)
    
            'construct the first chunk of the sql insert string
    
            If (x <= furthest_right) And (x >= furthest_left) And (rownumber <= bottom_of_set) And (rownumber >= top_of_set) Then
                sSql = sSql & "'" & oledata_array(col_number - 1, row_index) & "'"
                col_number = col_number + 1
            Else
                sSql = sSql & "''"
            End If
    
            'finish the sql string and execute
    
        Next x