Search code examples
rvbaexcelpivot-tablereshape2

melt / reshape in excel using VBA?


I'm currently adjusting to a new job where most of the work I share with colleagues is via MS Excel. I am using pivot tables frequently, and therefore need "stacked" data, precisely the output of the melt() function in the reshape (reshape2) package in R that I've come to rely on for this.

Could anyone get me started on a VBA macro to accomplish this, or does one exist already?

The outline of the macro would be:

  1. Select a range of cells in an Excel workbook.
  2. Start "melt" macro.
  3. Macro would create a prompt, "Enter number of id columns", where you would enter the number preceding columns of identifying information. (for the example R code below it's 4).
  4. Create a new worksheet in the excel file titled "melt" that would stack the data, and create a new column titled "variable" equal to the data column headers from the original selection.

In other words, the output would look exactly the same as the output of simply executing these two lines in R:

require(reshape)
melt(your.unstacked.dataframe, id.vars = 1:4)

Here's an example:

# unstacked data
> df1
  Year Month Country  Sport No_wins No_losses High_score Total_games
2 2010     5     USA Soccer       4         3          5           9
3 2010     6     USA Soccer       5         3          4           8
4 2010     5     CAN Soccer       2         9          7          11
5 2010     6     CAN Soccer       4         8          4          13
6 2009     5     USA Soccer       8         1          4           9
7 2009     6     USA Soccer       0         0          3           2
8 2009     5     CAN Soccer       2         0          6           3
9 2009     6     CAN Soccer       3         0          8           3

# stacking the data
> require(reshape)
> melt(df1, id.vars=1:4)

  Year Month Country  Sport    variable value
1  2010     5     USA Soccer     No_wins     4
2  2010     6     USA Soccer     No_wins     5
3  2010     5     CAN Soccer     No_wins     2
4  2010     6     CAN Soccer     No_wins     4
5  2009     5     USA Soccer     No_wins     8
6  2009     6     USA Soccer     No_wins     0
7  2009     5     CAN Soccer     No_wins     2
8  2009     6     CAN Soccer     No_wins     3
9  2010     5     USA Soccer   No_losses     3
10 2010     6     USA Soccer   No_losses     3
11 2010     5     CAN Soccer   No_losses     9
12 2010     6     CAN Soccer   No_losses     8
13 2009     5     USA Soccer   No_losses     1
14 2009     6     USA Soccer   No_losses     0
15 2009     5     CAN Soccer   No_losses     0
16 2009     6     CAN Soccer   No_losses     0
17 2010     5     USA Soccer  High_score     5
18 2010     6     USA Soccer  High_score     4
19 2010     5     CAN Soccer  High_score     7
20 2010     6     CAN Soccer  High_score     4
21 2009     5     USA Soccer  High_score     4
22 2009     6     USA Soccer  High_score     3
23 2009     5     CAN Soccer  High_score     6
24 2009     6     CAN Soccer  High_score     8
25 2010     5     USA Soccer Total_games     9
26 2010     6     USA Soccer Total_games     8
27 2010     5     CAN Soccer Total_games    11
28 2010     6     CAN Soccer Total_games    13
29 2009     5     USA Soccer Total_games     9
30 2009     6     USA Soccer Total_games     2
31 2009     5     CAN Soccer Total_games     3
32 2009     6     CAN Soccer Total_games     3

Solution

  • I've got two posts, with usable code and downloadable workbook, on doing this in Excel/VBA on my blog:

    http://yoursumbuddy.com/data-normalizer

    http://yoursumbuddy.com/data-normalizer-the-sql/

    Here's the code:

    'Arguments
    'List: The range to be normalized.
    'RepeatingColsCount: The number of columns, starting with the leftmost,
    '   whose headings remain the same.
    'NormalizedColHeader: The column header for the rolled-up category.
    'DataColHeader: The column header for the normalized data.
    'NewWorkbook: Put the sheet with the data in a new workbook?
    '
    'NOTE: The data must be in a contiguous range and the
    'columns that will be repeated must be to the left,
    'with the columns to be normalized to the right.
    
    Sub NormalizeList(List As Excel.Range, RepeatingColsCount As Long, _
        NormalizedColHeader As String, DataColHeader As String, _
        Optional NewWorkbook As Boolean = False)
    
    Dim FirstNormalizingCol As Long, NormalizingColsCount As Long
    Dim ColsToRepeat As Excel.Range, ColsToNormalize As Excel.Range
    Dim NormalizedRowsCount As Long
    Dim RepeatingList() As String
    Dim NormalizedList() As Variant
    Dim ListIndex As Long, i As Long, j As Long
    Dim wbSource As Excel.Workbook, wbTarget As Excel.Workbook
    Dim wsTarget As Excel.Worksheet
    
    With List
        'If the normalized list won't fit, you must quit.
       If .Rows.Count * (.Columns.Count - RepeatingColsCount) > .Parent.Rows.Count Then
            MsgBox "The normalized list will be too many rows.", _
                   vbExclamation + vbOKOnly, "Sorry"
            Exit Sub
        End If
    
        'You have the range to be normalized and the count of leftmost rows to be repeated.
       'This section uses those arguments to set the two ranges to parse
       'and the two corresponding arrays to fill
       FirstNormalizingCol = RepeatingColsCount + 1
        NormalizingColsCount = .Columns.Count - RepeatingColsCount
        Set ColsToRepeat = .Cells(1).Resize(.Rows.Count, RepeatingColsCount)
        Set ColsToNormalize = .Cells(1, FirstNormalizingCol).Resize(.Rows.Count, NormalizingColsCount)
        NormalizedRowsCount = ColsToNormalize.Columns.Count * .Rows.Count
        ReDim RepeatingList(1 To NormalizedRowsCount, 1 To RepeatingColsCount)
        ReDim NormalizedList(1 To NormalizedRowsCount, 1 To 2)
    End With
    
    'Fill in every i elements of the repeating array with the repeating row labels.
    For i = 1 To NormalizedRowsCount Step NormalizingColsCount
        ListIndex = ListIndex + 1
        For j = 1 To RepeatingColsCount
            RepeatingList(i, j) = List.Cells(ListIndex, j).Value2
        Next j
    Next i
    
    'We stepped over most rows above, so fill in other repeating array elements.
    For i = 1 To NormalizedRowsCount
        For j = 1 To RepeatingColsCount
            If RepeatingList(i, j) = "" Then
                RepeatingList(i, j) = RepeatingList(i - 1, j)
            End If
        Next j
    Next i
    
    'Fill in each element of the first dimension of the normalizing array
    'with the former column header (which is now another row label) and the data.
    With ColsToNormalize
        For i = 1 To .Rows.Count
            For j = 1 To .Columns.Count
                NormalizedList(((i - 1) * NormalizingColsCount) + j, 1) = .Cells(1, j)
                NormalizedList(((i - 1) * NormalizingColsCount) + j, 2) = .Cells(i, j)
            Next j
        Next i
    End With
    
    'Put the normal data in the same workbook, or a new one.
    If NewWorkbook Then
        Set wbTarget = Workbooks.Add
        Set wsTarget = wbTarget.Worksheets(1)
    Else
        Set wbSource = List.Parent.Parent
        With wbSource.Worksheets
            Set wsTarget = .Add(after:=.Item(.Count))
        End With
    End If
    
    With wsTarget
        'Put the data from the two arrays in the new worksheet.
       .Range("A1").Resize(NormalizedRowsCount, RepeatingColsCount) = RepeatingList
        .Cells(1, FirstNormalizingCol).Resize(NormalizedRowsCount, 2) = NormalizedList
    
        'At this point there will be repeated header rows, so delete all but one.
       .Range("1:" & NormalizingColsCount - 1).EntireRow.Delete
    
        'Add the headers for the new label column and the data column.
       .Cells(1, FirstNormalizingCol).Value = NormalizedColHeader
        .Cells(1, FirstNormalizingCol + 1).Value = DataColHeader
    End With
    End Sub
    

    You’d call it like this:

    Sub TestIt()
    NormalizeList ActiveSheet.UsedRange, 4, "Variable", "Value", False
    End Sub