I'm trying to make an Internet Search Engine for school, with no more than C# and the .NET framework. I need to download the HTML code of the pages I'm indexing.
Now all it takes is to have a list of valid URLs.
Since I don't have a database of valid URLs, I made a trial and error algorithm, which grows a string:
a, b, c.....
aa, ab, ac......
aaa, aab, aac......
aaaa, aaab, aaac......
aaaaa, aaaab, aaaac......
and then tries to concatenate with .com, .net or whatever. This is too inefficient.
I need a database with valid URLs. Do you know where I can get one?
I can't work out how to get them straight out of DNS - is this something that's possible?
You can build your own. Most search engines crawl pages and follow links to other pages.
You start with a known list (it doesn't have to be very big) then:
As for using DNS; it's not designed to query URLs, only hostnames. And, as far as I know, you can't get a list of every hostname from a DNS server unless you manage the server yourself.