Search code examples
bashawkgrepcharactercapture

bash: capture lines with the same specific number of characters from the beginning


I want to capture lines which have the same beginning on their first nth characters and only output one of those lines no matter what comes after the first nth character. If the line has less than nth chars, then send it to output as it is.

I tried grep to capture the first specific number of chars but it removes the rest!

cat myfile.txt | grep -o -P '^{0,41}' or cat myfile.txt | grep -o -P '.{0,0}http.{0,41}'

Here I have a file and I want to capture lines which are the same in their first 41 characters and only show one of them:

https://example.com/first/second/blahblah/?alsda=asldfaalafowiorie
https://example.com/first/second/blahblah/?oriwo=asldkjalkdjf2kasd
https://example.com/first/second/blahblah/some/more/dir
https://example.com/another/one
https://example.com/third/fourth/something/?cldl=aosijfoiret
https://example.com/third/fourth/something/?cldl=5145652
https://example.com/third/fourth/something/?hfdg=156569&wuew=8428
https://example.com/first/second/blahblah/

Desired output

https://example.com/first/second/blahblah/?alsda=asldfaalafowiorie
https://example.com/another/one
https://example.com/third/fourth/something/?cldl=aosijfoiret

Thanks.


Solution

  • awk '!seen[substr($0,1,41)]++' file