Search code examples
regexpowershellcsvmatrixkey

Renaming string via key/value matrix


I am attempting to rename strings that have been pulled from a user-defined field. So, I have set up a matrix that creates a relationship between what comes from the UDF and what I want that string to be renamed as.

Here is pseudo example of my matrix (CSV):

Key,Value
FooBar1234,Test 1
ASDF[1234]=qwerty,Test 2
Yellow,Test 3
YellowString,Test 3
1234567890,Test 1
Foo,Test 1
Jane Smith,Test 4
Jane,Test 4
Jane(ABC),Test 4

Here is an example of the data that my script is getting fed (CSV):

FooBar1234
ASDF[1234]=qwerty
Yellow
YellowString
1234567890
Foo
Jane Smith
Jane
Jane(ABC)
Jane(abc)

Here is my PowerShell script:

$path = 'Path\To\CSV\file.csv'

$data = Import-Csv $path -Header UDFName

$matrix = @{}
#import matrix
$re = Import-Csv 'Path\To\Matrix\file.csv' | ForEach-Object{
    #key equals value
    $matrix[$_.Key] = $_.Value
    [regex]::Escape($_.Key)
}
$re = $re -join '|' -as [regex]


$data | ForEach-Object {

    $udf_rename = $re.Replace($_.UDFName,  { $matrix[$args[0].Value] })

    $_.UDFName = $udf_rename

    $_

}| ConvertTo-Csv | Select-Object -Skip 2 | Set-Content $path

And this is the full output:

"Test 1"
"Test 2"
"Test 3"
"Test 3String"
"Test 1"
"Test 1"
"Test 4"
"Test 4"
"Test 4(ABC)"
"Test 4(abc)"

The issue that I'm running into (aside from the fact that it's a user-defined field 😓) is that when I try to replace the Key with the Value, my code only replaces part of the input string for some inputs. For example, if YellowString is passed through the loop, I would anticipate to see Test 3 as the result. However, because Yellow,Test 3 is part of the matrix, I end up getting Test 3String as the result. My loop only replaces the Yellow portion of YellowString because of the relationship that's set up between Yellow and Test 3 in the matrix. Something similar happens with Jane and Jane(ABC).

Unfortunately, I cannot remove Yellow,Test 3 and the other variations from the matrix, so I need to figure out how to exactly match the input string. I tried stripping all the spaces from the Keys, but it still yields the same result. I also thought that maybe I could attach unique codes to each key and then translate that way, but then I run into the same problem when it comes to replacement. Any ideas? Please let me know if I can provide any more information.


Solution

  • PowerShell's regex functionality and .NET's regex APIs operate on substrings by default.
    (E.g., 'food' -replace 'oo', '@@' yields 'f@@d', as does ([regex] 'oo').Replace('food', '@@'))

    To ensure that each input string is matched in full, anchor each regex with ^...$:

    '^{0}$' -f [regex]::Escape($_.Key)
    

    Note: The above uses -f, the format operator to synthesize the string; alternatively, use
    '^' + [regex]::Escape($_.Key) + '$' or
    "^$([regex]::Escape($_.Key))`$"