Search code examples
phpregexsearchsearch-enginesphider

Trouble with "URLs must include" with Sphider Search Engine


I'm using Sphider.

I want to allow Spider to leave my domain http://www.example.com but only crawl/visit urls containing example. Means only urls like http://www.example.com or http://www.my-example.com or http://www.test.example.com should get visited/indexed but NOT http://www.exa-mple.com.

After reading the manual I tried the following: Screenshot of what I tried.

But I'm getting this message when trying to index: Image: What I'm getting when trying to index.

Who can help me. What am I doing wrong? I also already tried *example* but this also didn't work.


Solution

  • The documentation contains a misleading example:

    Every string starting with a '*' in front is considered as a regular expression, so that '*/[a]+/' denotes a string with one or more a's in it.

    The [...] is a character class that matches any single character from a set/range defined inside it.

    You can use a */example/ to define a regex that matches an example string. However, if you are not interested in checking the context, you might as well use an example string in the must include list.