Search code examples
regexpowershellvalidationgithubuser-input

need powershell script to validate user input for valid unix path


I need to validate if the user has input valid unix path syntax and not the actual path on the host machine.

There can be single or multiple paths separated by comma or white space and enclosed in either single quotes, double quotes, or no quotes at all.

powershell attempt below fails to validate the above conditions:

  - name: Validate Inputs

    run: |

      $inputPaths = "${{ inputs.source_files }}"

      # Check if the input is not empty

      if (-not $inputPaths) {
        echo "Error: 'paths' input is required."
        exit 1
      }

      # Check syntax of each provided path
      $pathsArray = $inputPaths -split ',| '

      foreach ($path in $pathsArray) {

        if (-not ($path -match "^[a-zA-Z]:\\|\\\\|/'[^'\s].*'$|^[a-zA-Z]:\\|\\\\|/\"[^\"\s].*\"$|^[a-zA-Z]:\\|\\\\|/[^'\s]+$")) {
          echo "Error: '$path' is not a valid absolute path syntax."
          exit 1

        }
      }

      echo "Inputs have valid syntax. 

valid inputs are

/tmp/mydir
'/tmp/my  dir1'
"/tmp/my  dir2"
/tmp/mydir '/tmp/my  dir1' '/tmp/my  dir2'
'/tmp/my  dir1','/tmp/my  dir2'

Invalid inputs:

'/tmp/my  dir1,/tmp/my  dir2'
/tmp/my  dir1
'/tmp/my  dir1
/tmp/my  dir1'

I tried validating the quotes but it errors on a valid quote:

$paths = "'/u/marsh/UNX/scripts/testscript/test_maillist.txt' '/pathx with space/file1' '/path,with,commas/file2' ""/double quoted path/file3"" ""/path with space/file4"" 'single quoted path/file5' /pathx with space/file1"

# Split paths by whitespace or comma while preserving paths enclosed in quotes

$splitPaths = $paths -split "(?<=\S)\s+|(?<=\S),"

foreach ($path in $splitPaths) {

    # Check if the path is enclosed in single or double quotes

    if (-not (($path -like "'*'") -or ($path -like '"*"'))) {

        Write-Host "Error: Path '$path' is not enclosed in single or double quotes."
        exit 1
    }

    # Remove leading and trailing quotes

    $cleanPath = $path.Trim("'").Trim('"')  

    Write-Host "Cleaned Path: $cleanPath"

}

Error output when it should not have:

Cleaned Path: /u/marsh/UNX/scripts/testscript/test_maillist.txt
Error: Path ''/pathx' is not enclosed in single or double quotes.

Kindly suggest.


Solution

  • It looks like your input paths are in the form of lists of string literals and/or barewords:

    • One of your invalid path examples - '/tmp/my dir1,/tmp/my dir2' - seems to impose a non-obvious constraint on your validation:

      • Verbatim /tmp/my dir1,/tmp/my dir2 is formally a valid single path, given that , is a legitimate character in file names.

        • Fundamentally, as tripleee points out, technically it is only NUL (the character with code point 0x0) that is invalid in paths in file-systems on Unix-like platforms.
      • The solution below therefore disallows the presence of verbatim , in a single path - adjust as needed.

    The following solution uses a two-step approach:

    • It first parses a list of paths into the verbatim items it represents, via a direct call to the [regex]::Match() API.

      • For an explanation of the regex used with [regex]::Match() and the option to experiment with it, see this regex101.com page.

      • Note its limitation: For (relative) simplicity, it only supports embedded quoting in the form of using the quotation marks not used by the outer quoting (e.g.,
        '/foo/3" of snow' or "/foo/3'o clock"), but not also escaped embedded quoting (e.g., "/foo/3`" of snow" or '/foo/3''o clock')

    • It then validates each item with respect to whether it represents an absolute Unix-format path, using PowerShell's -match operator.

      • For an explanation of the regex used with -match and the option to experiment with it, see this regex101.com page.
    # Sample input paths.
    @(
      # --- valid
      '/tmp/mydir'
      "'/tmp/my  dir1'"
      '"/tmp/my  dir2"'
      "/tmp/mydir '/tmp/my  dir1' '/tmp/my  dir2'"
      "'/tmp/my  dir1','/tmp/my  dir2'"
      # --- invalid
      "'/tmp/my  dir1,/tmp/my  dir2'"
      '/tmp/my  dir1'  # partly valid (1st token)
      "'/tmp/my  dir1"
      "/tmp/my  dir1'"  
    ) | 
      ForEach-Object {
        # Parse each string as a comma- or whitespace-separated list composed of
        # string literals and/or barewords.
        $match = [regex]::Match(
          $_,
          '^\s*((?:(?<item>[^"''\s,]+)|(?<quote>["''])(?<item>.*?)\<quote>)(?:(?:\s*,?\s*)|$))+$'
        )
        if (-not $match.Success) {
          # Not a well-formed list of string literals and barewords:
          # Report the entire string as invalid.
          [pscustomobject] @{
            Path  = $_
            Valid = $false
          }
        }
        else {
          # List of string literals and barewords, validate each list item.
          $match.Groups['item'].Captures.Value | 
            ForEach-Object {
              [pscustomobject] @{
                Path  = $_
                # To allow "," in paths, remove "," from the regex below.
                Valid = $_ -match '^/(?:[^/\0,]+/?)*$'
              }
            }
          }
        }
    

    Output (note that each output line represents an (successfully parsed) individual path):

    Path                        Valid
    ----                        -----
    /tmp/mydir                   True
    /tmp/my  dir1                True
    /tmp/my  dir2                True
    /tmp/mydir                   True
    /tmp/my  dir1                True
    /tmp/my  dir2                True
    /tmp/my  dir1                True
    /tmp/my  dir2                True
    /tmp/my  dir1,/tmp/my  dir2 False
    /tmp/my                      True
    dir1                        False
    '/tmp/my  dir1              False
    /tmp/my  dir1'              False