I'm attempting to parse a txt file into a csv with 4 headers. However, I'm having some troubles successfully splitting/matching text between two strings that spans across multiple lines. I'm not very experienced with regex. Is there a way to do this without regex? Or does anyone know which combination of regex expressions I could use here?
I started with the following code but welcome any suggestions/recommendations if there's a better way to go about this:
$input = Get-Content ‘C:\Example.txt’
$output = ‘C:\Example.csv’
$regex = "'PRINT(.*?\n)PRINT'"
$regex2 = "'INSERT' -or $_ -eq 'UPDATE'"
$matches = [regex]::Matches($input, $regex)::multiline
$array = @()
$matches | for each-object{
$write-object = False
$obj = New-Object System.Object
If $_ -match $regex
}
If ($writeobj){
$obj | Add-Member -type NoteProperty -name row1 -value $row
$array += $obj
}
$array | Export-csv -path $output
DESIRED OUTPUT:
SECTION | UPDATE/INSERT | TABLE | CHANGE BLOCK |
---|---|---|---|
AA | UPDATE | Table_1 | COL_1 = ‘123’ WHERE ID= ‘1’ |
BB | INSERT | Table_2 | (DBO.COL_1) SELECT Col1Value AS [Col_1] FROM Table_1 |
CC | UPDATE | Table_1 | COL_3 |
CC | INSERT | Table_2 | (DBO.COL_3) SELECT Col3Value AS [Col_3] FROM Table_2 |
EXAMPLE .TXT CONTENT:
--AA*************************** --PRINT AA UPDATE Table_1 SET COL_1 = '123' WHERE ID = '1' --BB*************************** --PRINT BB INSERT INTO [dbo].[Table_2] (DBO.COL_1) SELECT Col1Value AS [Col_1] FROM Table_1 --CC*************************** --PRINT CC UPDATE Table_1 SET COL_3 = 'ABC' WHERE ID = '3' AND ID2 = '1a' INSERT INTO [dbo].[Table_2] (DBO.COL_3) SELECT Col3Value AS [Col_3] FROM Table_2
My suggestion is to read the entire file in as a multi-line string using the -Raw
switch parameter. Then you can split the file by sections (AA, BB, etc) and parse each section with regex. I do a bunch of -replace
to strip out just the info I want for each property here.
$input = Get-Content 'C:\Example.txt'
$output = 'C:\Example.csv'
$array = $input -split '(?ms)(?=--[^\*^\r^\n]+?\*{3,}[^\*])'|%{
$Record = $_ -replace '(?ms)^--([^\*]+)\*.+$','$1'
Switch -RegEx (($_ -split '(?ms)(?=UPDATE|INSERT)')){
'^UPDATE' {[PSCustomObject]@{SECTION=$Record;'UPDATE/INSERT'='UPDATE';TABLE=$_ -replace '(?ms)UPDATE (.+?)[\r\n].*','$1';'CHANGE BLOCK'=$_ -replace 'UPDATE.+?[\r\n]+' -replace '[\r\n]+',' ' -replace '^SET '}}
'^INSERT' {[PSCustomObject]@{SECTION=$Record;'UPDATE/INSERT'='INSERT';TABLE=$_ -replace '(?ms)INSERT INTO \S+?\[([^\]]+)].*','$1';'CHANGE BLOCK'=$_ -replace '(?ms)INSERT INTO \S+?\[([^\]]+)]\s*' -replace '[\r\n]+',' '}}
}
}
$array | Export-csv -path $output
That provided me with:
SECTION UPDATE/INSERT TABLE CHANGE BLOCK
------- ------------- ----- ------------
AA UPDATE Table_1 COL_1 = '123' WHERE ID = '1'
BB INSERT Table_2 (DBO.COL_1) SELECT Col1Value AS [Col_1] FROM Table_1
CC UPDATE Table_1 COL_3 = 'ABC' WHERE ID = '3' AND ID2 = '1a'
CC INSERT Table_2 (DBO.COL_3) SELECT Col3Value AS [Col_3] FROM Table_2