Search code examples
grepcut

How to grep only the directories without any extensions in bash


Suppose I have a list of URL named URL.txt and I only want the directories to be output not the files or extensions such as .html, .php etc. And if it finds any extension or any file in the URL the script should move on to the next URL

- https://example.com/tradings/trade/trading?currency=usdt&dest=btc&tab=limit
- https://example.com/account/signup/accounts/signin/account.html

I want results like this:

- https://example.com/tradings/
- https://example.com/tradings/trade/
- https://example.com/account/
- https://example.com/account/signup/
- https://example.com/account/signup/accounts/
- https://example.com/account/signup/accounts/signin/

I tried this command but it won't convert into a complete URL endpoint. I want a complete URL endpoint without any extension.

cat Urls.txt | rev | cut -d'/' -f 2 | sort -u | rev


Solution

  • Perl to the rescue!

    perl -lne '@parts = split m{/}; print join "/", @parts[0 .. $_] for 3 .. $#parts - 1' < URL.txt
    
    • -n reads the input line by line and runs the code for each line
    • -l removes newlines from input and adds them to print
    • Each line is split on /. We then reconnect the parts starting from 3 up to the last but one part.
    • See split and join for more details.