Search code examples
powershellpowershell-2.0powershell-3.0cpu-wordword-count

word counter with regex in powershell


var words = "word worddd woord wooord 45555";

var wordCount = words.match(/([a-zA-Z]\w+)/g).length;

if(wordCount == 4 || wordCount == 6 ){
  WScript.Echo(wordCount);//Result 4
}

How to make a ps1 script that work like like jscript


Solution

  • Generally speaking: the Measure-Object cmdlet has a -Word switch:

    $words = "word worddd woord wooord"
    
    $wordCount = ($words | Measure-Object -Word).Words
    
    if ($wordCount -in 4, 6){
       $wordCount # -> 4
    }
    

    Note: If you really need PS v2 support, use 4, 6 -contains $wordCount in lieu of $wordCount -in 4, 6


    To restrict what is considered a word to something that starts with an (ASCII-range) letter, as in your question, more work is needed:

    $words = "word worddd woord wooord 45555"
    
    # Count only words that start with an (ASCII-range) letter.
    $wordCount = ((-split $words) -match '^[a-z]\w+$').Count
    
    if ($wordCount -in 4, 6){
       $wordCount # -> 4, because '45555' wasn't counted.
    }
    
    • The unary form of -split, the string splitting operator, splits the string into an array of tokens by whitespace.

    • -match, the regular-expression matching operator, matches each resulting token agains the RHS regex:

      • -match finds substrings by default, hence the need for anchors ^ and $
      • -match is case-insensitive by default (as PowerShell generally is), so [a-z] covers both lower- and uppercase letters.
    • (...).Count returns the length (element count) of the resulting array of matching tokens.