Suppose I have a list of URL named URL.txt and I only want the directories to be output not the files or extensions such as .html, .php etc. And if it finds any extension or any file in the URL the script should move on to the next URL
- https://example.com/tradings/trade/trading?currency=usdt&dest=btc&tab=limit
- https://example.com/account/signup/accounts/signin/account.html
I want results like this:
- https://example.com/tradings/
- https://example.com/tradings/trade/
- https://example.com/account/
- https://example.com/account/signup/
- https://example.com/account/signup/accounts/
- https://example.com/account/signup/accounts/signin/
I tried this command but it won't convert into a complete URL endpoint. I want a complete URL endpoint without any extension.
cat Urls.txt | rev | cut -d'/' -f 2 | sort -u | rev
Perl to the rescue!
perl -lne '@parts = split m{/}; print join "/", @parts[0 .. $_] for 3 .. $#parts - 1' < URL.txt