Is there any bash command/script in Linux so we can extract the active domains from a long list,
example, I have a csv file (domains.csv) there are 55 million domains are listed horizontally, we need only active domains in a csv file (active.csv)
Here active mean a domain who has a web page at least, not a domain who is expired or not expired. example whoisdatacenter.info is not expired but it has no webpage, we consider it as non-active.
I check google and stack website. I saw we can get domain by 2 ways. like
$ curl -Is google.com | grep -i location
Location: http://www.google.com/
or
nslookup google.com | grep -i name
Name: google.com
but I got no idea how can I write a program in bash for this for 55 million domains.
below commands, won't give any result so I come up that nsloop and curl is wayway to get result
$ nslookup whoisdatacenter.info | grep -i name
$ curl -Is whoisdatacenter.info | grep -i location
1st 25 lines
$ head -25 domains.csv
"
"0----0.info"
"0--0---------2lookup.com"
"0--0-------free2lookup.com"
"0--0-----2lookup.com"
"0--0----free2lookup.com"
"0--1.xyz"
"0--123456789.com"
"0--123456789.net"
"0--6.com"
"0--7.com"
"0--9.info"
"0--9.net"
"0--9.world"
"0--a.com"
"0--a.net"
"0--b.com"
"0--m.com"
"0--mm.com"
"0--reversephonelookup.com"
"0--z.com"
"0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0.com"
"0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0.com"
"0-0-0-0-0-0-0-0-0-0-0-0-0-10-0-0-0-0-0-0-0-0-0-0-0-0-0.info"
code I am running
while read line;
do nslookup "$line" | awk '/Name/';
done < domains.csv > active3.csv
the result I am getting
sh -x ravi2.sh
+ read line
+ nslookup ''
+ awk /Name/
nslookup: '' is not a legal name (unexpected end of input)
+ read line
+ nslookup '"'
+ awk /Name/
+ read line
+ nslookup '"0----0.info"'
+ awk /Name/
+ read line
+ nslookup '"0--0---------2lookup.com"'
+ awk /Name/
+ read line
+ nslookup '"0--0-------free2lookup.com"'
+ awk /Name/
+ read line
+ nslookup '"0--0-----2lookup.com"'
+ awk /Name/
+ read line
+ nslookup '"0--0----free2lookup.com"'
+ awk /Name/
still, active3.csv is empty below . the script is working, but something stopping the bulk lookup, either it's in my host or something else.
while read line
do
nslookup $(echo "$line" | awk '{gsub(/\r/,"");gsub(/.*-|"$/,"")} 1') | awk '/Name/{print}'
done < input.csv >> output.csv
The bulk nslookup show such error in below
server can't find facebook.com\013: NXDOMAIN
[Solved] Ravi script is working perfectly fine, I was running in my MAC which gave Nslookup Error, I work in CentOS Linux server, Nslookup work great with Ravi script
Thanks a lot!!
EDIT: Please try my EDIT solution as per OP's shown samples.
while read line
do
nslookup $(echo "$line" | awk '{gsub(/\r/,"");gsub(/.*-|"$/,"")} 1') | awk '/Name/{found=1;next} found && /Address/{print $NF}'
done < "Input_file"
Could you please try following.
OP has control M characters in her Input_file so run following command too remove them first:
tr -d '\r' < Input_file > temp && mv temp Input_file
Then run following code:
while read line
do
nslookup "$line" | awk '/Name/{found=1;next} found && /Address/{print $NF}'
done < "Input_file"
I am assuming that since you are passing domain name you need to get their address(IP address) in output. Also since you are using a huge Input_file so it may be a bit slow in providing output, but trust me this is a simpler way.