Search code examples
regexbashunixsedescaping

Linux how to replace asterisk which only after a certain length of string


I am kind of new to Linux commands. Recently I got a big files of strings(4GB) The file format look like this.

1,2,http://*.example.org/
1,3,https://*.example.org/
1,4,https://*.example.org/*
1,5,https://example.org/*example

I want to find and replace every asterisk which only at the beginning of the line. The result I want, for example:

1,2,http://replaced.example.org/
1,3,https://replaced.example.org/
1,4,https://replaced.example.org/*
1,5,https://example.org/*example

What I have tried will replace every first occurence. Is there anyway I can do to get the result above ?

sed 's/*/replaced/' inputfile > outputfile

Solution

  • You can replace ://*. with ://replaced. using

    sed 's~://\*\.~://replaced.~' file > newfile
    

    Here,

    • ~ is used as a regex delimiter in order to avoid escaping / chars
    • ://\*\. is a POSIX BRE pattern matching ://*. substring (as * and . are special chars, they are escaped)

    Note that to match an asterisk at the start of string you just need the ^ anchor. So, to match and replace a * at the start of a string you would use

    sed 's/^\*/replaced/' file > newfile
    

    However, none of your sample texts contain an asterisk at the start of any line.

    If you plan to match and replace an asterisk at a specific position in the string you can capture a substring of the required length and replace with a backreference to the group value and the replacement text. For example:

    sed 's~^\(.\{11\}\)\*~\1replaced~' file > newfile
    

    will replace * only when it is the 12th char in the string (as is the case of the 1,2,http://*.example.org/ string).