Search code examples
awksed

How to Exclude everything after a character/string from an output in Linux


This is my file:

abc.test.com
efg.test.com:80/test1/123/xyz
xyz.test.com:443/test1
xab.test.com:80
lmn.test.com/100
com.test.com:10

I am trying to remove all characters after the string ".com", but I want to include ".com" in it. I tried sed 's/.com.*//', however it seems to exclude ".com" as well:

$ cat test1.txt | grep .com | sed 's/.com.*//'
abc.test
efg.test
xyz.test
xab.test
lmn.test
com.test

Is there a way to remove all characters after a particular string, however the output should still have that string it.


Solution

  • You don't have to use both grep and sed, you can just use either one of them.

    Your code 's/.com.*// replaces the match with an empty string instead of the .com that you want to keep, and also note that you have to escape the dot \. or also it would match any character.


    If you are using grep, and there is just a single occurrence of .com on the line, you can match that part and then output the match with -o

    grep -o ".*\.com" file
    

    An alternative using awk replacing the match with .com

    awk '{sub(/\.com.*/, ".com")}1' file
    

    Both will output

    abc.test.com
    efg.test.com
    xyz.test.com
    xab.test.com
    lmn.test.com
    com.test.com
    

    Note that the difference is that the sed and awk solutions will print a line that does not contain .com as they are doing a substition and then print that whole line.

    The grep solution will not display a line without .com as it prints the output for a match.