Search code examples
vbscriptdistinctadodbrecordset

ADOB.RecordSet Select Distinct command returning nulls where there are no nulls


Stack Overflow, I've blown my whole morning on this issue. I'm trying to help out a coworker with a script. He's not a programmer, he just copied down some code off the internet, and asked me to modify it to give the results he wanted. I pored over it and scrapped all unnecessary parts and rewrote it so it was doing what I wanted in a way I understand. I should be honest and say I only deal with VBscript in these contexts, when a coworker has one that needs fixing. I have all my VB experience in VB6.

The purpose of the script is to take a text file delimited with newlines & potentially filled with duplicate entries, and output it with all duplicates removed.

Set objConnection = CreateObject("ADODB.Connection")
Set objRecordSet = CreateObject("ADODB.Recordset")

strPathToTextFile = "C:\Scripts\"
strFile = "Test.txt"
strOutputFile = "C:\this_is_the_output_changeme.txt"

Dim objFSO, objFile
Set objFSO = CreateObject("Scripting.FileSystemObject")
set objFile = objFSO.CreateTextFile(strOutputFile)

sql = "Select DISTINCT * FROM " & strFile

objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
      "Data Source=" & strPathtoTextFile & ";" & _
          "Extended Properties=""text;HDR=NO;FMT=Delimited"""

objRecordSet.Open sql, objConnection

Do Until objRecordSet.EOF
    objFile.Write(objRecordSet.Fields.Item(0).Value)
    objFile.Write(vbCrLf)
    objRecordSet.MoveNext
Loop

objFile.Close

Seems pretty solid right? It works fine....depending on the input file. So here's the issue, sometimes it works like a charm, sometimes it gets confused and reports all non-number entries as a single distinct null.

Here are two example inputs that work just fine:

0
1
1
2
3
4
5
3
5
6
7
8
9
9
9

will output:

0
1
2
3
4
5
6
7
8
9

This input

gray
grey
gray
graey
greay
grey
gray
greasy
greay

outputs:

graey
gray
greasy
greay
grey

but a lot of other inputs cause this particular script to crash with a Type Mismatch error. If I swap out the objFile.Write's with a Wscript.echo, I can see that the objRecordSet is returning nulls.

The simplest input to recreate this error with is:

1
1
a
a

If I echo out this input, I get:

null
1

Basically any combination of letters and numbers produces this error. All letters get returned as a single null, and the numbers come out fine.

This seems like very bizarre behavior to me. It appears as if the RecordSet concludes that it's only going to receive number values if there are some number values, and throws out all letters as null numbers. As far as I can tell, it experiences this error in any input where there are half as many number entries as there are letter entries

I have been unable to determine a way to specify to receive all returned Items as strings. How should I pursue a solution to this issue?


Solution

  • The problem is caused by the driver gessing at the data type of the (one and only) column. Help the driver by putting a schema.ini file in the data source folder.

    My schema.ini for this demo:

    [numbers.txt]
    Format=TabDelimited
    ColNameHeader=False
    Col1=F1 FLOAT
    
    [texts.txt]
    Format=TabDelimited
    ColNameHeader=False
    Col1=F1 TEXT
    
    [mixed.txt]
    Format=TabDelimited
    ColNameHeader=False
    Col1=F1 TEXT
    

    Demo code:

      Const adClipString = 2
    
      Dim oCN     : Set oCN = CreateObject("ADODB.Connection")
      Dim sTDir   : sTDir   = goFS.GetAbsolutePathName("..\data")
      Dim aTables : aTables = Array("numbers.txt", "texts.txt", "mixed.txt")
    
      oCN.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
          "Data Source=" & sTDir & ";" & _
              "Extended Properties=""text;HDR=NO;FMT=TabDelimited"""
    
      Dim sTable
      For Each sTable In aTables
          Dim sFSpec : sFSpec = goFS.BuildPath(sTDir, sTable)
          WScript.Echo "  In:", Replace(goFS.OpenTextFile(sFSpec).ReadAll(), vbCrLf, " ")
          WScript.Echo "Seen:", oCN.Execute("SELECT * FROM [" & sTable & "]").GetString(adClipString, , "", " ", "NULL")
          WScript.Echo " Out:", oCN.Execute("SELECT DISTINCT * FROM [" & sTable & "]").GetString(adClipString, , "", " ", "NULL")
      Next
      oCN.Close
    

    QED output:

    Unique00 - unique via ADO Text Driver
    =================================================
      In: 2,05 2 1 2,5 3 2,05 2
    Seen: 2,05 2 1 2,5 3 2,05 2
     Out: 1 2 2,05 2,5 3
      In: grey gray gray
    Seen: grey gray gray
     Out: gray grey
      In: 1000 grey 10 gray 9 gray 9 1 gray
    Seen: 1000 grey 10 gray 9 gray 9 1 gray
     Out: 1 10 1000 9 gray grey
    =================================================
    xpl.vbs: Erfolgreich beendet. (0) [0.67188 secs]