I have empty columns at the end of my output dataset which I need to populate by looking up values from another sheet (Index-Match effectively).
I am limited to VBA. I have two approaches.
I'm aware this could be solved by writing the Index-Match formula into the cells manually and dragging down, but I am trying to make it simple for the people who perform this task (and to reduce room for error).
Background:
ws_ps example Column B as lookup value, with Column E to populate if a row in Column B is not empty.
ws_priips example Column G corresponds to the lookup value in ws_ps Column B, with the value to look for coming from Column E.
Approach 1: Looping through each row and using WorksheetFunction - 12 min runtime
On Error Resume Next
last_row_ps = ws_ps.UsedRange.Rows.Count
For ps_row = 2 To last_row_ps
ws_ps.Cells(ps_row, 5).Value = WorksheetFunction.IfError(WorksheetFunction.Index(ws_priips.Range("A:G"), _
WorksheetFunction.Match(ws_ps.Cells(ps_row, 2), ws_priips.Range("G:G"), 0), 5), "")
Next ps_row
On Error GoTo -1
Approach 2: Loading worksheets into arrays and writing to worksheet if conditions are met - 15min runtime
last_row_ps = ws_ps.UsedRange.Rows.Count
last_row_priips = ws_priips.UsedRange.Rows.Count
ps_array = ws_ps.Range("A1:X" & last_row_ps).Value
priips_array = ws_priips.Range("A1:AZ" & last_row_priips).Value
For ps_row = 2 To UBound(ps_array, 1)
For priips_row = 2 To UBound(priips_array, 1)
If ps_array(ps_row, 2) <> "" Then
If ps_array(ps_row, 2) = priips_array(priips_row, 7) Then
ws_ps.Cells(ps_row, 5).Value = priips_array(priips_row, 5)
GoTo SkipLoop
End If
Else
GoTo SkipLoop
End If
Next priips_row
SkipLoop:
Next ps_row
I implemented these solutions for 1 column, but I will need to apply it to around 10.
Is there a way to drastically speed this up, without resorting to typing the functions into the worksheet or using Python?
Read the data from the lookup sheet into a 2D array, then build a dictionary object with the ISIN value as the key, and the array "row" number as the value. Use the dictionary to extract matched row values from the array to populate to the sheet being filled. Faster if you first pull the "to be filled" and ISIN columns from the output dataset sheet, fill them using the dictionary and the lookup array, then at the end write them back to the output sheet. Basically try to avoid any cell-by-cell operations.
Using the code below, with both tables containing 150k rows (second table sorted randomly), I filled two columns in the first table in 3-4 sec.
Note if you need cross-platform support you can use a Collection in place of the dictionary, which is not available on Mac.
Sub Tester()
Dim arrDest, arrSrc, ws As Worksheet, isin As String
Dim dict As Object, r As Long, rMatch As Long
Dim rngDest As Range
'both my tables are on one sheet for testing
Set ws = ThisWorkbook.Worksheets("Sheet1")
'table to be filled
Set rngDest = ws.Range("A1").CurrentRegion
arrDest = rngDest.Value 'read to 2D array
'data used as lookup table, as an array
arrSrc = ws.Range("I1").CurrentRegion.Value
Set dict = CreateObject("scripting.dictionary")
'map isin to row number in lookup table array
For r = 2 To UBound(arrSrc, 1)
isin = arrSrc(r, 1)
If Len(isin) > 0 Then dict(isin) = r 'assumes no duplicates in lookup table
Next r
'loop the report table and try to match values in the lookup table
For r = 2 To UBound(arrDest, 1)
isin = arrDest(r, 1)
If Len(isin) > 0 Then
If dict.Exists(isin) Then 'have a match?
rMatch = dict(isin) 'matched row
arrDest(r, 3) = arrSrc(rMatch, 2) 'copy a couple of values
arrDest(r, 5) = arrSrc(rMatch, 3)
End If 'have match
End If 'not zero-length
Next r
rngDest.Value = arrDest 'replace data with updated array
End Sub
My test tables (shorter version):
Added: for anyone who needs cross-platform support (Windows/Mac) here's the basic "row map" approach using a Collection in place of a Dictionary. Loading the collection with 150k items takes about 10% longer than loading a dictionary of the same size, but retrieval by key is actually about 5x faster using a collection. Also seems like the Collection performance scales better for larger numbers of keys - for 400k items load and read is approx. 2.2 and 0.7 sec but for dictionary it's approx. 10.5 and 10 sec (much slower)
'Using a Collection like a dictionary for mapping column values
' to their position in a dataset
Sub CollectionTest()
Dim arr, r As Long, col As Collection, t, k As String, v
Set col = New Collection
'Source range A1:A150000 filled with "Val_000001", "Val_000002", etc
' sorted randomly
arr = Range("A1").CurrentRegion.Value
t = Timer
For r = 1 To UBound(arr, 1)
k = arr(r, 1)
'Add the row number as value and cell content as key
' note your row keys should be unique
If IsEmpty(KeyValue(col, k)) Then col.Add r, k
Next r
Debug.Print col.Count & " items"
Debug.Print "Loaded row map in " & Timer - t
t = Timer
For r = 1 To 150000
k = "Val_" & Format(r, "000000")
v = KeyValue(col, k)
If r < 5 Then Debug.Print "Key " & k & " at row# " & v
Next r
Debug.Print "Retrieved values in " & Timer - t
End Sub
'Retrieve value for key `k` from collection `col`
' Returns Empty if there's no such key
Function KeyValue(col As Collection, k As String)
On Error Resume Next 'ignore error if no match for `k`
KeyValue = col.Item(k)
End Function