Search code examples

Extract email:password

I'm curious if there's a way to extract email:password from a big list. It is listed in the text in that format but with a few other unuseable parts in front (such as name, last name).

The format is mostly:

But sometimes can be like this as well:

I have tried with EmEditor and if I search for


it does find it. I have then to replace with \1 - however this takes literally ages and finally crashes (the file is 17GB).

Knowing that powershell could do this too, I'm looking for the right command.


  • The switch statement allows combining efficient line-by-line processing of files (via the -File parameter), optionally combined with regex-matching (via the -Regex option):

    & { 
      switch -regex -file in.txt { 
       '(?<=:)[^@:]+@[^:]+:.*' { $Matches[0] } 
    } | Set-Content -Encoding utf8 out.txt

    Adjust the -Encoding argument as needed; note that in Windows PowerShell utf8 creates a file with BOM, whereas PowerShell [Core] v6+ creates one wihout BOM. By default, Set-Encoding uses the system's active ANSI code page in Windows PowerShell, whereas PowerShell [Core] v6+ consistently defaults to BOM-less UTF-8, across all cmdlets.

    The above extracts the email-password pairs extracted from file in.txt as individual lines to file out.txt.

    Note: Even though the above performs line-by-line processing, an out-of-memory exception can apparently still occur in Set-Content with very large input files; the .NET-based solution in the next section should fix that, while also significantly speeding up the operation.

    Performance caveat: While the above is memory-efficient, it will be slow with large files; to address that, you must make direct use of the .NET framework, via a System.IO.StreamWriter instance:

    # Create the output file.
    # Note:
    #  * Be sure to use a *full* path, because .NET's current dir. usually differs
    #    from PowerShell's
    #  * UTF-8 *without a BOM* is used as the character encoding by default,
    #    but you may pass a [System.Text.Encoding] instance as needed.
    $sw = [System.IO.StreamWriter]::new("$PWD/out.txt")
    switch -regex -file in.txt { 
       '(?<=:)[^@:]+@[^:]+:.*' { $sw.WriteLine($Matches[0]) } 