I recently installed Sphider onto my site and it was simple to do so and indexing the pages was very simple, however I ran into a small issue.
I have a lot (seriously loads) of pages on my site and a lot of them weren't indexed. I have a page which takes a .csv file and creates a table using a foreach loop in PHP and the first column is a hyperlink to each item with a dedicated page for that item. My issue arises whereby Sphider does not index these individual pages, it only indexes the table page. I'm in a right two and eight because I have no idea why these pages are not indexed.
I checked to see if I had any but I didn't and I even set Sphider to index a random one of the individual pages from the table and it appeared in the search. I'd do this with all the pages but I keep adding new pages every time we get a new item so I would get inundated with things to add to the index list.
My question comes here: is there some solution where I can have a script that adds each URL to Sphider's database seeing as that seems to make them appear; or am I being a complete div and am missing something really obvious here that because of the .csv PHP table something goes wrong, maybe?
I would really appreciate your help because I am completely confused.
Thanks, Carty
PS, What's the standard for including tl; dr? Is that just for Redditors? :P
I had a similar problem when I first started using Sphider Search that when I would try to spider a folder on my website eg. www.mysite.com/myfolder which contained 900 different html pages, it would only spider / list in database 1 link which was www.mysite.com/myfolder.
I Figured out That sphider wont spider a whole directory if it has a 'index.html' or 'home.html' or 'index.php' file in said folder.
So I temporarily deleted my index.html file, successfully spider'd all 900 html files. then re-uploaded my index.html
If index & home html files are not the cause, It might be your Spidering Link Depth Settings are not high enought.
lastly Sphider search respects the rel="nofollow" attribute in tags, so it wont index said links either.
Hope this helps.