I am working with several large Filemaker databases that contains very dirty data recorded over the course of 20+ years. Unfortunately, every database is essentially a single flat file with uncontrolled text fields with no validation or value lists of any kind. Not surprisingly different users were VERY inconsistent in the way they recorded certain types of data over that long of a time span (i.e., what and how things are entered into fields varies wildly). I need to clean several important fields in the database before I can pull the data into a sensible relational structure.
One field in particular contains long strings of text that are generally delimited by spaces. Is there a way to split records based on the delimiter in this field, duplicating all other fields for each record? For example, Contents is a space delimited field (which creates its own issues, but is beside the point here) that would be the basis for creating individual records. The number of delimited entries in the field varies from one to hundreds.
would become:
Splitting some of the fields into individual records is the only way I'm going to be able to fix a couple of the issues given the massive amount of inconsistency in data entry and the different ways entries will need to be parsed into new fields. Any help would be appreciated.
This was a bit of a workaround, but I eventually found a much easier solution by exporting the Box_id and Contents fields to Excel and running the following VBA code:
Sub SliceNDice()
Dim objRegex As Object
Dim X
Dim Y
Dim lngRow As Long
Dim lngCnt As Long
Dim tempArr() As String
Dim strArr
Set objRegex = CreateObject("vbscript.regexp")
objRegex.Pattern = "^\s+(.+?)$"
'Define the range to be analysed
X = Range([a1], Cells(Rows.Count, "b").End(xlUp)).Value2
ReDim Y(1 To 2, 1 To 1000)
For lngRow = 1 To UBound(X, 1)
'Split each string by ","
tempArr = Split(X(lngRow, 2), " ")
For Each strArr In tempArr
lngCnt = lngCnt + 1
'Add another 1000 records to resorted array every 1000 records
If lngCnt Mod 1000 = 0 Then ReDim Preserve Y(1 To 2, 1 To lngCnt + 1000)
Y(1, lngCnt) = X(lngRow, 1)
Y(2, lngCnt) = objRegex.Replace(strArr, "$1")
Next
Next lngRow
'Dump the re-ordered range to columns C:D
[c1].Resize(lngCnt, 2).Value2 = Application.Transpose(Y)
End Sub
This solution was modified from this previous question.