I want to make one project that parse wiki pages and get needed information from it.i check some crawler and dom parser like nutch apache crawler and simple dom parser.Parsing wiki page with core php is very slow.
But i cant get from
which tools can i use for best optimise result?
how to integrate nutch like crawler with php?
how to store data in mysql that fetch from crawler ?
How to organize data that fetch from crawler ?
which level of regular expression i have to learn ?
I am new in crawler kind of project .
Thanx in advance for your priceless time. Dont know why people closed my question.please reopen it.
There is a built in media wiki API thats available on wikipedia and there are some PHP examples on usage
The web service API provides direct, high-level access to the data contained in MediaWiki databases. Client programs can log in to a wiki, get data, and post changes automatically by making HTTP requests to the web service.