Search code examples
sqlexcelcrosstabunpivotvba

UNPIVOT columns using SQL query


I am pulling another workbook containing a table that has Columns like ItemCreationDate, and most columns (total 28 such columns) that begin with the word "Global" in them. i want to

  1. pull these "Global" columns (including ItemCreationDate) into an SQL recordset, add a new Column called Old/New and then
  2. UNPIVOT them i.e. Stack them up one above the other and
  3. in the next column, list their Column items along with their Counts.
  4. Their Counts are derived based on the ItemCreationDate where any date <2015 is OLD & >=2015 is NEW
  5. Final Output should be as shown in Output sheet.

i have attached a Sample.xlsx file where i have shown how i have to arrive at the Output tab starting from the Data sheet. This Data sheet is actually the input that i want to pull into a recordset and spit out the table as shown in the Output sheet. I don't want to create Pivot table as its cumbersome and the data is quite a lot, and i want an alternative SQL approach, wherein i can quickly aggregate the data and insert it into sheet in one go.

i am not using SQL Server, so cannot use the UNPIVOT command or Dynamic SQL to loop thru all the "Global" columns.

Basically i want to form a correct SQL string something like....

Dim arrSQL as variant
......
......
RS.Filter="Like Global*"
......
arrSQL = JOIN(RS.Fields, vbCr)

strSQL = "SELECT [arrSQL], IIF(YEAR([ITEM CREATION DATE])>=YEAR(DATE())-1,""NEW"",""OLD"") AS [New/Old]  from [Data$] GROUP BY...."
strSQL = strSQL & " UNION ALL " & vbcr & _
strSQL = strSQL & " ......

Now, run SQL on same recordset to reduce the columns and get required data format.... i know the above is not quite correct, but something on those lines so that i can get the correct output as show in the Output tab.

can anyone help quickly?

Edits for @a_horse_with_no_name :

see the screenshots of the sample file:

  1. Data sheet: this is actually a table from an input workbook that i want to pull in a Recordset. See the various "Global" column headings and their items that i want to unpivot.

enter image description here

  1. This are the 2 intermediate sheets "New" & "Old" i have to create everytime (that i want to get rid of actually). any items found in 2015 or later are put in New, whereas rest are put in Old.

enter image description here

enter image description here

  1. enter image description here

  2. JFYI, the formulae that are manually used in Output columns are :

C column (New):

=COUNTIF(INDEX(New!$A:$D,0,MATCH($A2,New!$1:$1,0)),Output!$B2)

D Column (Old):

=COUNTIF(INDEX(Old!$A:$D,0,MATCH($A2,Old!$1:$1,0)),Output!$B2)

E Column (% New):

=Output!C2/SUM(C$2:C$6)

F Column (% Old):

=Output!D2/SUM(D$2:D$6)

G Column (Index):

=IF(AND(E2<=0,F2<=0),0,IF(AND(E2>0,F2>0),E2/F2,1))

Hope this helps.


Solution

  • Indeed, you can run SQL queries in MS Excel using the Jet/ACE SQL Engine (a Windows .dll file) which is the very data store that MS Access is connected to by default. And as such, this technology equipped on all PCs is not restricted to any one Office/Windows program.

    Consider the following Excel VBA macro (if using Excel on PC) that connects to ACE via ADO running a union of three aggregate SQL queries (GLOBAL VIT/CALC, GLOBAL FLAVOURS, GLOBAL FLAVOUR GROUP) and conditional New and Old counts/percentages. The latter percentage column pair required subqueries.

    For proper setup, do the following:

    1. Make sure Item Creation Date is in MM-DD-YYYY (US-based) or DD-MM-YYYY (non-US based) date formats which is not how above screenshot or file has date field currently formatted.

      Sub FormatDates() For i = 2 To 2083 Range("A" & i) = CDate(Range("A" & i)) Next i End Sub

    2. Run macro in a different workbook than the one holding the data. Below assumes data workbook holds source information in worksheet named Data.

    3. In query-running workbook, create a blank worksheet named RESULTS which will be populated with query output including column headers.

    VBA Script (two connections available Driver (commented out) and Provider versions)

    Option Explicit
    
    Sub RunSQL()
        Dim cols As Object, datawbk As Workbook, datawks As Worksheet
        Dim lastcol As Integer, i As Integer, j As Variant, output As Variant
    
        Set cols = CreateObject("Scripting.Dictionary")
        Set datawbk = Workbooks.Open("C:\Path\To\Data\Workbook.xlsx;")
        Set datawks = datawbk.Worksheets("Data")
        lastcol = datawks.Cells(7, datawks.Columns.Count).End(xlToLeft).Column
    
        For i = 2 To lastcol
             cols.Add CStr(i - 1), datawks.Cells(1, i).Value
        Next i
    
        datawbk.Close False
        Set datawks = Nothing
        Set datawbk = Nothing
    
        output = DataCapture(cols)
    
    End Sub
    
    Function DataCapture(datacols As Object)
    On Error GoTo ErrHandle
        Dim conn As Object, rst As Object
        Dim strConnection As String
        Dim classSQL As String, itemSQL As String, grpSQL As String, strSQL As String
        Dim i As Integer, fld As Object, d As Variant, lastrow As Integer
    
        Set conn = CreateObject("ADODB.Connection")
        Set rst = CreateObject("ADODB.Recordset")
    
        ' Hard code database location and name '
    '    strConnection = "DRIVER={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _
    '                      & "DBQ=C:\Path\To\Data\Workbook.xlsx;"
        strConnection = "Provider=Microsoft.ACE.OLEDB.12.0;" _
                           & "Data Source='C:\Path\To\Data\Workbook.xlsx;" _
                           & "Extended Properties=""Excel 12.0 XML;HDR=YES IMEX=1;"";"
    
        ' OPEN DB CONNECTION '
        conn.Open strConnection
    
        For Each d In datacols.keys
            strSQL = " SELECT '" & datacols(d) & "' AS [COLUMN], [Data$].[" & datacols(d) & "] AS ITEMS," _
                        & "   SUM(IIF(Year([Item Creation Date]) >= Year(Date()) - 1, 1, 0)) AS NEW," _
                        & " " _
                        & "   SUM(IIF(Year([Item Creation Date]) < Year(Date()) - 1, 1, 0)) AS OLD," _
                        & " " _
                        & "   ROUND(SUM(IIF(Year([Item Creation Date]) >= Year(Date()) - 1, 1, 0)) / " _
                        & "   (SELECT Count(*) FROM [Data$] AS sub" _
                        & "    WHERE Year(sub.[Item Creation Date]) >= Year(Date()) - 1),2) AS NEWPCT," _
                        & " " _
                        & "   ROUND(SUM(IIF(Year([Item Creation Date]) < Year(Date()) - 1, 1, 0)) / " _
                        & "   (SELECT Count(*) FROM [Data$] AS sub" _
                        & "    WHERE Year(sub.[Item Creation Date]) < Year(Date()) - 1),2) AS OLDPCT" _
                        & " FROM [Data$]" _
                        & " GROUP BY [Data$].[" & datacols(d) & "]"
    
            ' OPEN RECORDSET '
            rst.Open strSQL, conn
    
            ' COLUMN HEADERS '
            If d = 1 Then
                i = 0
                Worksheets("RESULTS").Range("A1").Activate
                For Each fld In rst.Fields
                    ActiveCell.Offset(0, i) = fld.Name
                    i = i + 1
                Next fld
            End If
    
            ' DATA ROWS '
            lastrow = Worksheets("RESULTS").Cells(Worksheets("RESULTS").Rows.Count, "A").End(xlUp).Row
            Worksheets("RESULTS").Range("A" & lastrow + 1).CopyFromRecordset rst
    
            rst.Close
        Next d
    
        conn.Close
    
        MsgBox "Successfully processed SQL query!", vbInformation
        Exit Function
    
    ErrHandle:
        MsgBox Err.Number & " - " & Err.Description, vbCritical
        Exit Function
    End Function
    

    Output

    Excel SQL Query Output Screenshot