Search code examples
powershellurlshortcut-filefile-search

How to search for specific URLs in ".url" files?


I have a directory D:\DMS with lots of subfolders. Within these subfolders there are plenty of ".url" files which include URLs like

http://mmtst399:8080/dms/objekt?page=index&mode=browser&oid=K60081800
http://zrtpwvap877/dms/download?doc=N59748000

In order later on to replace some of the URLs I would like to search for such URLs by PowerShell eg. find a all URLs that start with http://mmtst399:8080/ or find all URLs that contain K60081800 or find all URLs that start with http://zrtpwvap877/dms/.

That seems to be difficult to search for such URL within .url files. I tried already many different PowerShell sample scripts with like and so on, but finally it often shows "No results, No files found" even that I know that there are .url files in sub folder which contain such URLs. Of course it will be difficult to replace URLs in such files if PowerShell cannot even find that .url files.

I would like to search for such URLs in .url files, a txt log with paths of the found results. I guess searching such URL is difficult because some contain = and ? and other characters.


Solution

  • Use Select-String with a regex to match the URL parts of interest:

    # The literal URL parts to find, either prefixes or substrings.
    $urlParts = 'http://mmtst399:8080/', 'K60081800', 'http://zrtpwvap877/dms/'
    
    # Formulate a regex that matches any of the above URL parts.
    # The URL lines inside *.url files start with "URL="
    $regex = '^URL=({0})' -f (
      $urlParts.ForEach({ 
        $escaped = [regex]::Escape($_) # Escape for literal matching
        if ($escaped -match '^https?:') { $escaped } 
        else                            { '.*' + $escaped } # Match anywhere in the URL
      }) -join '|'
    )
    
    # Search all *.url files in the subtree of D:\DMS for the URL parts
    # and output the full paths of matching files.
    # -Force ensures inclusion of *hidden* files too.
    # Outputs to the screen; append e.g. > log.txt to save to a file.
    Get-ChildItem -Force -Recurse D:\DMS -Filter *.url | 
      Select-String -Pattern $regex -List | 
      ForEach-Object Path
    

    Note:

    • The URL= entry of a .url file is permitted to contain URLs without a protocol specifier (e.g. example.org instead of https://example.org), which is why the regex for partial matching employed starts with .* (meaning any run of characters including possibly none).