I am total new with Python.
I try to make analog bash command: cat domains.txt |sort -u|sed 's/^*.//g' > domains2.txt
File domains contains list of domains with and without mask prefix *.
like:
*.example.com
example2.org
About 300k+ lines
I wrote this code:
infile = "domains.txt"
outfile = "2"
outfile2 = "3"
with open(infile) as fin, open(outfile, "w+") as fout:
for line in fin:
line = line.replace('*.', "")
fout.write(line)
with open('2', 'r') as r, open(outfile2, "w") as fout2 :
for line in sorted(r):
print(line, end='',file=fout2)
its cut *.
as planned, sort list, but doesn't remove duplicates of lines
I had advise to use re.sub instead of replace to make pattern more strict (like in sed where I do it from beginning of lines), but when I tried this:
import re
infile = "domains.txt"
outfile = "2"
outfile2 = "3"
with open(infile) as fin, open(outfile, "w+") as fout:
for line in fin:
newline = re.sub('^*.', '', line)
fout.write(newline)
with open('2', 'r') as r, open(outfile2, "w") as fout2 :
for line in sorted(r):
print(line, end='',file=fout2)
it just doesn't work with errors, which I don't understand.
In regular expressions *
, .
and alike are special characters. You should escape them in order to use them.
import re
s = "*.example.com"
re.sub(r'^\*\.', '', s)
> 'example.com'