I have a list of 50k possible domain names. I'd like to find out which ones are available and if possible how much they cost. the list looks like this
presumptuous.ly
principaliti.es
procrastinat.es
productivene.ss
professional.ly
profession.ally
professorshi.ps
prognosticat.es
prohibitioni.st
I've tried whois but that runs way too slow to complete in the next 100 years.
def check_domain(domain):
try:
# Get the WHOIS information for the domain
w = whois.whois(domain)
if w.status == "free":
return True
else:
return False
except Exception as e:
print("Error: ", e)
print(domain+" had an issue")
return False
def check_available(matches):
print('checking availability')
available=[]
for match in matches:
if(check_domain(match)):
print("found "+match+" available!")
available.append(match)
return available
I've also tried names.com/names bulk upload tool but that doesn't seem to work at all.
How do I determine the availability of these domains?
You can use for example multiprocessing
package to speed-up the process, i.e.:
import os
import sys
from multiprocessing import Pool
import pandas as pd
from tqdm import tqdm
from whois import whois
# https://stackoverflow.com/a/8391735/10035985
def blockPrint():
sys.stdout = open(os.devnull, "w")
def enablePrint():
sys.stdout = sys.__stdout__
def check_domain(domain):
try:
blockPrint()
result = whois(domain)
except:
return domain, None
finally:
enablePrint()
return domain, result.status
if __name__ == "__main__":
domains = [
"google.com",
"yahoo.com",
"facebook.com",
"xxxnonexistentzzz.domain",
] * 100
results = []
with Pool(processes=16) as pool: # <-- select here how many processes do you want
for domain, status in tqdm(
pool.imap_unordered(check_domain, domains), total=len(domains)
):
results.append((domain, not bool(status)))
df = pd.DataFrame(results, columns=["domain", "is_free"])
print(df.drop_duplicates())
Prints:
100%|██████████████████████████████████████████████| 400/400 [00:07<00:00, 55.67it/s]
domain is_free
0 xxxnonexistentzzz.domain True
5 facebook.com False
11 google.com False
14 yahoo.com False
You can see it checks ~55 domains per second.