I'm using Sphider.
I want to allow Spider to leave my domain http://www.example.com but only crawl/visit urls containing example
. Means only urls like http://www.example.com or http://www.my-example.com or http://www.test.example.com should get visited/indexed but NOT http://www.exa-mple.com.
After reading the manual I tried the following: Screenshot of what I tried.
But I'm getting this message when trying to index: Image: What I'm getting when trying to index.
Who can help me. What am I doing wrong? I also already tried *example*
but this also didn't work.
The documentation contains a misleading example:
Every string starting with a
'*'
in front is considered as a regular expression, so that'*/[a]+/'
denotes a string with one or more a's in it.
The [...]
is a character class that matches any single character from a set/range defined inside it.
You can use a */example/
to define a regex that matches an example
string.
However, if you are not interested in checking the context, you might as well use an example
string in the must include list.